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ABSTRACT 

A measurement of the cosmological 21 cm signal remains a promising but as-of-yet unattained am¬ 
bition of radio astronomy. A positive detection would provide direct observations of key unexplored 
epochs of our cosmic history, including the cosmic dark ages and reionization. In this paper, we con¬ 
centrate on measurements of the spatial monopole of the 21cm brightness temperature as a function 
of redshift (the “global signal”). Most global experiments to date have been single-element experi¬ 
ments. In this paper, we show how an interferometer can be designed to be sensitive to the monopole 
mode of the sky, thus providing an alternate approach to accessing the global signature. We provide 
simple rules of thumb for designing a global signal interferometer and use numerical simulations to 
show that a modest array of tightly packed antenna elements with moderately sized primary beams 
(full-width-half-max of ~ 40°) can compete with typical single-element experiments in their ability to 
constrain phenomenological parameters pertaining to reionization and the pre-reionization era. We 
also provide a general data analysis framework for extracting the global signal from interferometric 
measurements (with analysis of single-element experiments arising as a special case) and discuss trade¬ 
offs with various data analysis choices. Given that interferometric measurements are able to avoid a 
number of systematics inherent in single-element experiments, our results suggest that interferometry 
ought to be explored as a complementary way to probe the global signal. 

Subject headings: Reionization, dark ages, first stars — techniques: interferometric 


1. INTRODUCTION 

While recent years have marked tremendous progress 
in astronomical measurements at increasingly high red- 
shifts, still missing are direct observations of our Uni¬ 
verse when the first generation of luminous objects were 
being formed. Such observations would provide con¬ 
straints on crucial periods in our cosmic timeline, in¬ 
cluding the epoch of reionization, when the intergalactic 
medium (IGM) experienced a large-scale phase transi¬ 
tion, changing from neutral to almost fully ionized. Op¬ 
tical and infrared observations at z < 7 have provided 
some constraints on the end stages of reionization (Fan 
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make observations difficult to interpret (Dijkstra et al. 
2014 Taylor & Lidz 2014). Cosmic microwave back- 
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1), but the resulting 


one’s line-of-sight and are at best rather coarse probes 
of the relevant astrophysics. These existing probes have 
even more difficulty pushing beyond reionization to the 
earlier epoch known as the dark ages, during which time 
the first stars were formed. 

One promising way to directly observe both reioniza- 
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tion and the dark ages would be to make use of t he 21 cm 
hyperfine transition of hydrogen (Madau et al. 1997). 


By statistically measuring the brightness temperature of 
the 21 cm line, one probes both the distribution of large 
scale structure (using atomic hydrogen as a tracer) and 
the ionization state of the IGM. Given the abundance of 
neutral hydrogen at a broad range of redshifts through 
the end of reionization, the 21 cm line is an ideal way to 
place direct constraints on the first luminous objects and 


how they affected their surroundings (see, e.g., Furlan- 
etto et al. H20061 iMorales fe Wyithel 2010| Pritchard &: 


Loeb||20i2 Loeb fc Furlanetto||2013 for reviews). More- 


over, the spectral nature ot 21 cm measurements not only 
allows a three-dimensional reconstruction of the bright¬ 
ness temperature distribution, but also provides infor¬ 
mation on its evolution. 

Just as with the CMB, the 21 cm line can be charac¬ 
terized by a mean brightness temperature (obtained by 
averaging the cosmological signal in angle over the entire 
sky) and anisotropic fluctuations about this mean. How¬ 
ever, unlike the CMB, the mean 21cm brightness tem¬ 
perature does not follow a simple blackbody spectrum. 
Instead, this “global 21 cm signal” is richly depende nt o n 
the astrophysics of the dark ages and reionization (Shaver] 

Bi 


et al.| |1999| Pri tohard fc Loeb|2010| [Morandi fc Barkana] 


20121. Figure |l| shows a schematic of a fiducial model 

of the global 21 cm signal, highlighting the important 
epochs and corresponding features in the signal. The 
first epoch, the cosmic “dark ages,” arises with the ther¬ 
mal decoupling of the 21 cm spin states from the cosmic 
microwave background (CMB) and is marked by a shal¬ 
low absorption feature. As the gas density continues to 
fall with the universe’s expansion, collisions are no longer 
able to couple the spin states to the gas, and the signal 
falls back into coupling with the CMB. The next epoch 
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Fig. 1.— A fiducial model of the global 21 cm signal, as modeled in|Pritchard & Loeb||2010]l. Although this model should capture the 
essential features of the signal, the precise details have yet to be confirmed and depend on the nature of the first stars and galaxies. 


is marked by the formation of the first stars and galax¬ 
ies, whose Lya photons strongly couple the spin states to 
the gas temperature. This first results in a deep absorp¬ 
tion feature, as the gas te mperature is far below t hat 
of the CMB. (Although see Gnedin & Shaver 2004 for 
an example where shock heating can reduce the depth 
of the feature). Eventually, heating from X-ray emission 
pushes the gas above the CMB temperature, resulting in 
a 21 cm emission signal. This leads to the final epoch, 
the “epoch of reionization,” where UV photons ionize the 
gas, gradually erasing the 21 cm signal. 

A measurement of the 21 cm global signal would also 
have the potential to rule out other models such as those 
involving dark matter annihilations or stellar black holes. 
Dark matter annihilation scenarios provide heating be¬ 
yond that from X-ray emission and hence dampen the 
absorption and emission signals at the end of th e da rk 
ages and during the epoch of reionization (Valdes et al. 


2013 Evoli et al.|2014 1. Similarly, ionizing photons from 


accreting stellar black holes might als o add significant 
heating. Work by |Mirabel et al. (2011) suggests that in¬ 
cluding the effects of black holes would cause the 21 cm 
emission signal to occur earlier than otherwise expected 
and would also shorten the width of the absorption fea¬ 
ture. A global 21 cm measurement would provide evi¬ 
dence for or against such models. 

Unfortunately, the low-frequency observations required 
for a measurement of the global 21cm signal are techni¬ 
cally challenging. For example, the lowest fre quencies 


are strongly affected by ionospheric fluctuations dVedan - 
tham et al. 2014 Datta et al. |2014 |Rogers et al ||2()fT ~ 
which imprint systematics m the final measured spectra. 


For this reason, the highest redshift dip (labeled “Dark 
Ages” in Figure [l]) will be extremely difficult to measure, 
and in this paper we concentrate on forecasts for obser¬ 
vations targeting the absorption feature at ~ 70 MHz 
(henceforth denoted the “pre-reionization dip”) and the 
gradual decay of the signal due to reionization from 100 
to 200 MHz. 

In addition, foreground contamination is a serious con¬ 
cern. Consider Figure [2j for example, where we show a 
model of Galactic sy nchr otron radiation at 150 MHz from 
de Oliveira-Costa et ah (j2008|. Even in the coolest parts 
of the sky (e.g., the Galactic poles), the synchrotron fore¬ 
grounds dwarf the cosmological signal, as we can see from 
examining the scales on Figure |T) Moreover, foregrounds 
get brighter as one moves to lower and lower frequencies, 


Galactic Foregrounds at 150MHz 
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Fig. 2.— An em pirically motivated model o f Galactic synchrotron 
emission from |de Oliveira-Costa et al. || |2008|. Foregrounds such as 
Galactic syncnrotron radiation dominate the cosmological signal 
and must be removed from the data. 


which again makes observations increasingly challenging 
as one moves to higher and higher redshifts. The same is 
true for other foreground sources, such as extragalactic 
point sources, whether resolved or part of an unresolved 
background. 

To access the cosmological signal, one must subtr act or 


fit ou t the foregrounds. Some proposals (e.g., Liu et al. 


2013) take advantage of the angular structure of fore- 


grounds to aid foreground mitigation, with the reason¬ 
ing that any non-monopole signature on the sky cannot 
be the cosmological global signal. However, since the 
foregrounds do contain a monopole mode, there must 
ultimately be some subtraction of foregrounds in the fi¬ 
nal spectra. Most data analysis pipelines assume that 
foregrounds are spectrally smooth, and access the cos¬ 
mological signal by fitting smooth functional forms to 
the initial spectra. Key to the success of this strategy 
is the assumption that the foregrounds can be modeled 
with a relatively small number of parameters. Other¬ 
wise, there is the danger of overfitting the spectrum, 
which destroys cosmological information. For this as¬ 
sumption to hold true, instrumental systematics must 
be exquisitely controlled. An unsmooth spectral ripple 
in the instrument, for example, will imprint chromatic 
signatures in the measured foregrounds, increasing the 
number of fitting parameters needed for their removal. 
Although there may be ways to mitigate t hese system- 
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atics in data analysis ((Switzer & Liu 2014|, it is best to 
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not incur them in the first place. 

There are currently a large number of experiments 
seeking to make a first detection and characterization 
of the global 21cm signal. The Experiment to Detect 
the Global EoR Signal (ED GES) uses an extremely w ell- 
calibrated single element (Rogers & Bowman 2012) to 
integrate over large parts ol the sky, producing a global 
spectrum from 100 to 200 MHz. Modeling foreground 
spectra as a sum of low-order polynomials, EDGES has 
placed a low er limit of A z > 0.06 on the duration of 
reionization (Bowman & Rogers 2010). Similar in con¬ 
cept but operating at a lower frequency range of 55 to 
99 MHz is the Sonda Cosmologica de las Islas para la 
Detection de Hidrogeno Neutro (SCI-HI) experiment. 
This frequency range corresponds to the redsliift range 
13.3 < x < 24.9, providing access to the prominent dip in 
the signal prior to reionization. Using a similar polyno¬ 
mial foreground subtraction technique to EDGES, SCI- 
HI is able to a chieve a foregr ound residual level of ~ 10 K 
at ~ 70 MHz ( Voytek et al.||2014 1. Other single-element 
systems include the Shaped Antenna m easurement of the 
background RAdio Spectrum (SARAS; iPatra et al. 12013 1 
and Broadband Instrument f or Global HydrUgen Re ioTV- 
isation Signal (BIGHORNS; |Sokolowski et al. 


2014). 


To escape radio frequency interference (REl), most of 
these experiments are deployed in remote locations. For 
example, EDGES observes from the Murchison Radio- 
astronomy Observatory in Western Australia, while SCI- 
HI has been deployed at Isla Guadalupe in Mexico, with 
plans to observe at Isla Socorro and/or Isla Clarion in 
the future. To achieve even better RFI isolation, as well 
as to escape ionospheric distortions, the Dark Ag es Ra¬ 


dio Explorer (DARE) satellite has been proposed (Burns 
et al. 2012 Harker et al.|2012 ). DARE consists of a short 


dipole antenna in lunar orbit, which allows the Moon to 
be used as an RFI shield. DARE probes a frequency 
range of 40 to 120 MHz, again providing direct access to 
the pre-reionization epoch. 


Moving b eyond single elem e nt ex periments, Mahesh 


et al. (2014) and Singh et al. (2015) have explored the 


possibility of extracting an auto-correlation from a cross¬ 
correlation of two elements by using a resistive fence 
to act as a radio-wavelength analog to a beam-splitter. 
Placing the fence between the two elements of a baseline 
effectively creates a zero-baseline interferometer, which 
is sensitive to the monopole. Continuing to increase the 
number of elements, the Large-aperture Experiment to 
detect the Dark Ages (LEDA) makes use of a full interfer¬ 
ometric array of antennas to si multaneously model th e 
sky and calibration parameters (Bernardi et al. 2014). 
Fundamentally, however, its measurement of the global 
signal is still expected to come from total power mea¬ 
surements (i.e., auto-correlations) from single elements 
treated i ndependently. This dif fers f rom the approach 
taken by IMcKinley et al. (2013) and Vedantham et al. 
(2015), where the L(Jw Frequency ARray (LOFAR) was 
operated as a true interferometer not just for calibration 
purposes, but also for the cosmological measurement it¬ 
self. At a basic level, one might imagine that interfer¬ 
ometers are sensitive only to spatially fluctuating sig¬ 
nals on the sky (if one follows the standard procedure 
of avoiding noise bias by discarding auto-correlations in 
the data), and are therefore insensitive to the global sig¬ 
nal. However, an externally imposed spatial dependence 


can introduce sufficient spatial structure for a glo bal sig¬ 
nal to be me asurable by an interferometer. Vedantham 


et al. (2015) took advantage of this fact by observing 


fields containing the Moon, effectively using lunar occul- 
tations to introduce the necessary spatial dependence for 
an interferometric measurement of the global signal. So 
far, this approach has yielded a reasonably high signal- 
to-noise characterization of the foreground contaminants 
between v = 35 and 80 MHz. 

While perhaps slightly more complicated to construct 
than single-element experiments, interferometers can po¬ 
tentially provide easier control of certain instrumen¬ 
tal systematics. For example, by omitting the auto¬ 
correlation mode of an antenna with itself, an interfero¬ 
metric experiment avoids the systematic noise bias that 
would have to be modeled and subtracted off in a single¬ 
element experiment. This noise, which arises in ampli¬ 
fication stages, typically has significant spectral struc¬ 
ture that can be crippling for a global 21 cm experiment. 
Flux scale calibration may also be easier with multiple 
elements, since the elements can be coherently phased 
to bright astronomical s ources with known positio ns. In 


this paper, we build on Vedantham et al. (20151, gen¬ 
eralizing their work to consider a general theory of in¬ 
terferometric global signal measurements. We provide 
a mathematically rigorous framework for extracting the 
global signal from an interferometer. We also provide 
guiding principles for the design of a global signal inter¬ 
ferometer. Performing numerical simulations of a fidu¬ 
cial interferometer, we find that interferometry can be a 
competitive way to probe the global signal. 

The rest of this paper is organized as follows. In Sec¬ 
tion [2] we offer some qualitative intuition for using in¬ 
terferometry to measure monopole signals and provide 
rules-of-thumb for the design of a global signal interfer¬ 
ometer. Section [3] establishes a general framework for 
data analysis, providing a convenient language for con¬ 
sidering various data analysis trade-offs and choices. In 
Section El we perform numerical simulations to compare 
the performance of single-element experiments and inter¬ 
ferometers, before summarizing our conclusions in Sec¬ 
tion 0 


2. BACK-OF-THE-ENVELOPE ARRAY DESIGN 

In this section, we use simplified toy models to build 
intuition for the types of interferometer arrays that are 
best suited for probing the global signal. The goal here 
is to provide a rough sense for what might be a sensi¬ 
ble design, which we will analyze in a more numerically 
detailed fashion in the rest of the paper. 

2.1. A semi-qualitative picture of a global signal 
interferometer 

Consider first a purely qualitative picture of a two- 
element interferometer. An interferometer returns a vis¬ 
ibility by integrating over a primary-beam attenuated 
fringe pattern on the sky. If the primary beam did not 
exist (i.e., if it were constant over the entire sky) and 
the sky were infinite in extent, it would be impossible 
to measure a monopole. With those approximations, the 
fringe pattern would be purely sinusoidal, and integrat¬ 
ing such a pattern against a constant (monopole) mode 
would return zero. However, as shown schematically in 
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Fig. 3.— A schematic illustration of a two-element interferome¬ 
ter. The two antennas, separated by a baseline length b, measure 
a fringe pattern on the sky, shown as a cosine function on an arc. 
This pattern would integrate to zero over the sky if there were 
no primary beam and the sky were infinite in extent. However, 
both these assumptions are violated in practice: the curved sky 
is finite, and the antennas also produce a beam pattern that at¬ 
tenuates sensitivity to certain parts of the sky (e.g., the horizon 
for zenith-pointing elements). The result is a pattern that does 
not integrate to zero and an interferometer that is sensitive to the 
monopole mode. 


Figure [3j the enveloping presence of the primary beam 
and the finite extent of the sky prevent the interferom¬ 
eter’s response from integrating to zero. This allows an 
interferometer to be sensitive to the monopole mode. 

An equivalent way to look at the problem is to move 
into Fourier space and to examine the uv plane, as shown 
in Figure[4] An interferometer with baseline length b will 
measure the sky at the point u = vh/c in the uv plane. 
To measure the monopole, one must recover information 
from the origin, which corresponds to a zero-length base¬ 
line. This is possible because multiplying the sky by the 
primary beam in image space corresponds to convolv¬ 
ing by the primary beam’s uv plane footprint in Fourier 
space. This footprint is typically on the order of the 
physical size of the antenna element, measured in wave¬ 
lengths. Its effect is to smear out the point of measure¬ 
ment on the uv plane (i.e. the measurement incorporates 
information from other nearby uv locations), with nar¬ 
rower image-space beams corresponding to larger smears. 
As shown by the green circle in Figure [4j if the base¬ 
line is short enough and the image-space beam is narrow 
enough, then the measured signal will include informa¬ 
tion from the global signal. 

At first sight, the result that an interferometer can be 
sensitive to the monopole of the sky seems to contradict 
standard ideas in interferometry. To help clarify this 
point of confusion, let us first review the standard lore. 
The van Cittert-Zernicke Theorem states that the cross¬ 
correlation between the electric field at locations Xi and 
X 2 is given by 


(£(xi)F;(x2)*) oc 


dfiT(r)exp — * 27 t— r • (xi 
1 c 



(!) 

where we have opted to describe the sky in terms of its 
brightness temperature rather than the specific intensity, 
in order to conform to our convention in later sections. 



Fig. 4.— A schematic illustration of the uv plane for an interfer¬ 
ometer. The uv plane is the Fourier dual to image space, and an 
interferometer with baseline b makes a measurement at the point 
u = vh/c. Measuring the monopole signal requires making a mea¬ 
surement at the origin of the uv plane, which is probed by a base¬ 
line of zero length. This seems impossible without measuring the 
auto-correlation of an antenna with itself, until one recalls that a 
real interferometer does not measure an exact point: the antenna’s 
beam has a uv plane footprint that smears the measurement, so 
that the measurement contains information from nearby uv points. 
Since a narrower beam corresponds to a larger convolving kernel, 
two narrow-beam antennas placed close together would make a 
measurement that incorporates information from the monopole, as 
shown by the green circle in the figure. Antenna size scales both 
the center point and the extent of the uv kernel, thus leaving the 
response to the global signal approximately the same. The scales 
on the uv axes shown here are thus somewhat arbitrary. 

With an interferometric baseline, one correlates not the 
electric field at two specific points, but rather, the inte¬ 
grated electric fields over the physical areas of the anten¬ 
nas forming the baseline. We therefore measure 

V oc J d 2 Xid 2 x 2 /(xi)g(x 2 )(-E(xi)F;(x 2 )*), (2) 

where we have assumed that the antennas are coplanar, 
allowing us to integrate only over two-dimensional ver¬ 
sions of Xi and X 2 . The functions / and g describe the 
electric field sensitivities of the first and second antennas 
comprising the baseline, respectively. They are assumed 
to be equal to zero outside the rough physical extent 
of the antennas. Suppose the sky emission is constant 
(i.e., consisting only of a monopole), so that T( r) = T 0 . 
Combining our expressions with this restriction and the 
explicit notation r = ( l,m,n ), we obtain 

V oc To / - 7 =^^= f{lv/c,mv/c)g*{lv/c,mv/c), 

( 3 ) 

where 

f (^, dxdyf(x,y)e ~ i2 ^ lx+m »> (4) 

is the two-dimensional Fourier transform of /, and simi¬ 
larly for g , with x = ( x,y). 

Now, suppose we operate under the flat-sky approxi¬ 
mation, which as we shall see, is inappropriate for global 
signal interferometry. This is equivalent to omitting the 
factor of \/l — l 2 — to 2 in the denominator of our expres¬ 
sion for the visibility V. If one further extends the limits 
of the integration to infinity, Parseval’s theorem applies 
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and one obtains 


V 


cx T 0 


flat, finite 


dx/(x) 5 (x). 


( 5 ) 


If the two antennas that form our baseline do not overlap, 
then neither will / and g , resulting in /(x)g(x) = 0 and 
thus V = 0. This is the standard result that suggests 
that it is impossible to measure the monopole with an 
interferometer. Phrased differently, the physical size of 
the antennas (/ and g) make it difficult to have a baseline 
short enough for there to be substantial overlap with the 
u = 0 mode in Figure |4j 

With the full expression for V, however, one sees that 
the response to the monopole does not necessarily van¬ 
ish, even if the antennas are not physically co-located. 
Abandoning the assumptions of a flat, finite sky, one may 
define 


F(l, m, v) 


f (Iv / c,mv / c) 
(1—Z 2 —m 2 ) 1 / 4 

o 


if l 2 + m 2 < 1 
otherwise, 


( 6 ) 


and similarly for g. Parseval’s theorem can then be ap¬ 
plied to our expression for V, despite the complications 
of a curved, finite sky. The result (suppressing the fre¬ 
quency dependence for notational simplicity) is 


Vex T 0 


dxF(x)G(x), 


( 7 ) 


and does not vanish because the effective apertures F 
and G will in general overlap. In words, an interferome¬ 
ter is sensitive to the monopole mode because a monopole 
does not appear as a constant as far as the interferom¬ 
eter is concerned. From the interferometer's viewpoint, 
the finite extent of the sky means that it is effectively 
making a measurement on an infinite plane, but one that 
is identically zero beyond a circle of fixed radius given by 
the horizon. The interferometer therefore sees a constant 
sky as having spatial structure, allowing for a non-zero 
response. This response is further enhanced by the geo¬ 
metric effect of projecting a spherical hemisphere down to 
a two-dimensional image plane. Such a projection maps 
large solid angles near the horizon to small portions of 
the image plane, leading to an “edge brightening” effect, 
yet again imprinting spatial structure on the signal so 
that it can be picked up by an interferometer. These 
effects and their importance for an interferometric mea- 
su rement of the global signa l were independently noted 


Thyagarajan et al.| (|2015|). 

We demonstrate our results numerically with a toy 
example in Figure [5] The top panel shows the cross- 
sections of two circular apertures that each have radius 
0.5 m. The apertures are packed as closely together as 
possible without overlapping, and in the infinite, flat-sky 
approximation, Equation (I5| implies that an interfero¬ 
metric baseline formed by pairing these two apertures 
will have no sensitivity to the monopole mode. In the 
bottom panel are the effective aperture functions (i.e., 
the correct ones to use when considering the curved, fi¬ 
nite sky) at 150 MHz for the same configuration. The ef¬ 
fective apertures clearly overlap, and consequently Equa¬ 
tion 0 implies a sensitivity to the monopole]^] Another 

5 The bottom panel of Figure[5]looks superficially like the Fourier 



x [m] 


Fig. 5. — Top: Cross-section of two circular apertures (shown 
in blue and orange) of radius 0.5 m placed side-by-side, with their 
product (identically zero everywhere) in green. Bottom: Cross- 
section of the corresponding effective apertures at 150 MHz, with 
the (now non-zero) product again shown in green. Whereas the lack 
of overlap between the two apertures in the top panel implies that 
interferometers are insensitive to the monopole, this conclusion is 
incorrect when finite curved sky effects are taken into account. 
Incorporating such effects, the effective apertures overlap and give 
rise to a non-zero response to the monopole, which can be obtained 
by integrating the green curve. 


example is provided in Figure [6j where we move beyond 
a toy model and use realistic antenna models (Pober| 
et al. |2012 1 from the Donald C. Backer Precisi on Array 


for Probing the Epoch of Reionization (PAPER; Parsons 
et al.pOlO). Physically, the PAPER antennas are ~ 2 m 


in extent. Placing two antennas as close as possible re¬ 
sults in the effective aperture functions shown in Figure 

t Again, the non-zero overlap results in sensitivity to 
e monopole mode. 

In the rest of Section [2] we will make this rough picture 
more mathematically precise and develop an array design 


transform of the top panel, but we emphasize that this is not what 
is being plotted. To obtain the bottom panel, one Fourier trans¬ 
forms the top panel, ap plies the truncation and edge brightening 
implied by Equation (mb, and then inverse Fourier transforms back 
to the original space, yielding the effective aperture functions used 
in Equation 0 
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Fig. 6.— Similar to the bottom plot in Figure [5] but using re¬ 
alistic beam models from the PAPER array. The antennas are 
once again placed side-by-side. The results shown here are the 
effective apertures for the two antennas at 150 MHz, arbitrarily 
peak-normalized to unity. Again, the overlap implies a sensitiv¬ 
ity to the monopole, with the response being proportional to the 
integral over the product of the two effective apertures (shown in 
green). 

that not only maximizes sensitivity to the global signal, 
but also ensures that subsequent foreground subtraction 
operations are robust. This will require being cognizant 
of the data analysis strategy. This strategy, which we will 
develop fully in Section [3j first analyzes the visibilities on 
a frequency-by-frequency basis to estimate the strength 
of the monopole mode at each frequency. The result is 
a spectrum whose smooth components are then removed 
in an attempt to remove foregrounds. Essential to the 
success of this removal is the assumption that the instru¬ 
ment does not impose complicated frequency structures 
on the true sky spectra. This provides a strong constraint 
on our instrumental design. In service of this goal, when 
analyzing interferometric data we will discard data from 
the auto-correlation of an antenna itself, in order to avoid 
the noise bias effects that pose extra challenges in cali¬ 
bration of single-element experiments. 

2.2. Sparse or packed arrays? 

Suppose we consider an array consisting of just a single 
baseline and ask what baseline length b optimizes recov¬ 
ery of the global signal. Intuitively, a short b increases 
sensitivity to the signal, since (for a given uv plane pri¬ 
mary beam kernel) short baselines have the greatest over¬ 
lap with the u = 0 mode. On the other hand, fore¬ 
grounds contamination is worst for short baselines, since 
they are mostly sensitive to the smoothest spatial modes 
of the sky, where foregrounds dominate. There must 
therefore exist an intermediate baseline length that best 
balances these two competing demands, which we will 
now compute. 

In the flat-sky approximation, the visibility response 
V (b) of a baseline b to the sky temperature T(0) is given 

by 

V(b) = J T(9)A(9) exp (-i27r^b • 0) d 2 d, (8) 

where A(G) is the primary beam pattern. Without loss 


of generality, we may normalize our primary beam such 
that H(0) = 1. In principle, our use of the flat-sky ap¬ 
proximation is inappropriate for a discussion of global 
signal interferometry, given the conceptual picture we 
presented in the previous section. However, we will in¬ 
voke the flat-sky approximation only for the purposes 
of enhancing physical intuition, and the formalism and 
numerical results of subsequent sections will properly in¬ 
corporate the curved sky. In fact, if one prefers, one 
may simply replace all the aperture functions (which en¬ 
ter through the beam patterns) with the effective aper¬ 
ture functions. The rest of the mathematics- including 
what we do in this section —then carries through without 
change. 

For notational compactness, we will not explicitly high¬ 
light the frequency dependence of To, A(9), and V (b), al¬ 
though of course these quantities all implicitly vary with 
frequency. Setting T(9) = T 0 for a monopole signal, one 
obtains the result 

V (b) = To A (vh/c ), (9) 

with A is defined as the Fourier transfornj^jof A(9). This 
suggests that an appropriate (though not necessarily op¬ 
timal) estimator T 0 for the global signal T 0 might be 


f - _ V(b) 

A(vh/c) 


( 10 ) 


Intuitively, a baseline b has sensitivity A(vh/c — u) to 
spatial wavenumber u, so the prescription suggested here 
is to simply divide the measured visibility by the response 
to the u = 0 mode. In the absence of foregrounds and 
noise, this is guaranteed to return the true To. In reality, 
of course, we have contributions from both foregrounds 
and noise. To describe the former, we can write T{9) 
as the sum of T 0 and a foreground contribution Tf g (0). 
This then yields a foreground perturbation Vf g (b) to the 
visibility, of the form 

^fg( b ) = J T tg (u)A - uj d 2 u, (11) 


where we have applied the convolution theorem to Equa¬ 
tion ([8]), with Tf g denoting the Fourier transform of the 
foreground temperature field. With this perturbation to 
V(b), our estimator contains more than just the contri¬ 
bution from the true global signal: 


T 0 = T 0 + 


A (vh/c 




— u I d 


( 12 ) 


where we have included an additive instrumental noise 
contribution n to the visibility. Taking the ensemble av¬ 
erage of both sides and assuming that the noise averages 
to zero (i.e., there are no persistent instrumental sys- 
tematics such as crosstalk), it follows that the average 
deviation ATq from the truth is given by 


AT 0 = (T 0 )-T 0 


Tfg^b/c) 


A |A-„ 


(13) 


A (izb/c) 

B Throughout this paper, we adopt a Fourier convention 
where f(0) has a Fourier transform /(u) = J /(0)e _l2lru ' e d?6 . 
The inverse transform is correspondingly given by f(0) = 

f f(u)e i2nu d d 2 u. 
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where we have assumed that the primary beam A is a 
relatively broad function on the sky, resulting in a com¬ 
pact uv plane footprint A. This allows the factor of Tf g 
in Equation ([12] ) to be evaluated at u = vh/c and fac¬ 
tored out of tn e integral. What remains in the integral in 
Equation ( |13[ ) is simply the integral of the primary beam 
kernel over the entire uv plane, which equals A(0) = 1. 
We thus have 


AT n = 


TfgWc) 

A (vh/c) 


(14) 


This represents the bias that foregrounds introduce into 
our estimate of the global signal, which we can seek to 
minimize by varying b. 

For illustrative purposes, let us consider specific mod¬ 
els for A and Tf g . If the primary beam is taken to be a 
2-D Gaussian with a width of 0/, then our normalization 
convention dictates that A(u) = 2n0/ exp(— 2tt 2 0/u 2 ), 
where u = |u|. As for Tf g , one can imagine the fore¬ 
grounds to have statistical properties described by some 
angular power spectrum Cfo Often, Ci is fit by a power 
law so that Ci oc £~ a , where a is typically between 2 
and 3 (depending on various factors such as frequency 
and Galactic latitude). In the flat-sky approximation, 
we have i ~ 2nu, which allows us to translate the angu¬ 
lar power spectrum into a power spectrum P(u) oc u~ a 
on the uv plane. Given this, it is reasonable (on dimen¬ 
sional grounds) to take Tf g (u) a u~ a / 2 , which yields 


A T 0 (b) oc ( ^ 


-a/2 


exp ( 27 t 2 0/- 


(15) 


where b = |b|. Minimizing this expression by differenti¬ 
ating with respect to b gives an optimal baseline length 
fopt of 


L'opt 

"a - 


1 


2 t rO b 


(16) 


Based on the discussion of aperture sizes in Section [2TT 
we immediately recognize the factor of (2\/27r0&) 


as 


being the characteristic “radius” of a receiving element 
in units of wavelength. Now, the remaining factor \J~ol is 
of order unity and less than 2. Since one cannot place 
antenna elements closer than their diameters)^] we can¬ 
not in fact achieve the optimal baseline length suggested 
by Equation (16). However, we can come close to this by 


placing the antennas as close together as is physically 
possible. Essentially, our optimization suggests such 
a compact configuration because the foreground power 
does not decrease dramatically with increasing spatial 
wavenumber (i.e., we never have a» 1), so the reduced 
sensitivity to the global signal from having a longer base¬ 
line is not worth the relatively small decrease in fore¬ 
ground contamination. 

In the derivation that we have just presented, we fo¬ 
cused exclusively on minimizing the systematic bias that 

7 Note that when dealing with elements like dipole antennas (as 
opposed to aperture-like elements such as dishes), the electrical 
response can be larger than the physical antenna size. With careful 
ante nna design, it may therefore be possible to abide by Equation 

Jl6l. 


would result from foreground contamination. Alterna¬ 
tively, we could have instead chosen to minimize the vari¬ 
ance (i.e., the error bars) on our estimator To. Unlike the 
bias, the variance contains a noise term, since (n) = 0 in 
the absence of systematics, but (|n| 2 ) will be non-zero. 
This will tend to reduce the optimal baseline length, 
given that short baselines increase signal-to-noise. But 
since Equation (161 predicts close to the shortest possible 


baseline anyway, our minimum-bias solution also serves 
as an excellent approximation to a minimum-variance so¬ 
lution. 

Making a slight leap from a single baseline to a full in¬ 
terferometer array, this section argues for a packed array, 
where antenna elements are placed as close together as 
possible. A packed array naturally results in a regular, 
periodic arrangement of antennas, giving a large number 
of identical copies of our single (short) baseline. Our con¬ 
clusion then rests on the assumption that a large regular 
array is essentially just that—a large collection of re¬ 
peated, short baselines—and no more. In general, this is 
not a good description of an array, since for large arrays, 
even close-packed antenna configurations provide longer 
baselines that might provide valuable information about 
foregrounds for advanced data analysis techniques. In 
terms of the cosmological global signal, however, longer 
baselines have very little sensitivity to the signal of in¬ 
terest. We can see this from Equation where A is 
typically a function that drops off away from the origin, 
so that as one increases b from zero (i.e., a single element 
experiment) to a short baseline to a long baseline, the vis¬ 
ibility response to the monopole T 0 drops. For measuring 
the signal, long baselines therefore contribute negligibly, 
and a large array can be thought of as simply a large 
collection of multiple short baselines. We can therefore 
make the leap from the single baseline derivations of this 
section to argue that packed arrays are desirable. 

2.3. Wide beams or narrow beams? 

The arguments in the previous subsection suggested a 
particular relative placement of antenna elements: an¬ 
tennas should be packed together as tightly as possible. 
However, the absolute scale of the array remains unspeci¬ 
fied. Primary beams smaller than ~ 1—2° will likely have 
difficulty distinguishing a re presen tative 21 cm global sig¬ 
nal from local fluctuations (Bittner & Loeb 2011). How¬ 
ever, that constraint still leaves a large range of potential 
primary beam sizes and baseline lengths. In this section, 
we answer the question of whether it is better to have 
a packed array with physically small antenna elements 
(and therefore short baselines and wide primary beams), 
or a packed array with larger elements (and therefore 
longer baselines and narrow primary beams). We will ul¬ 
timately find that although narrowest beams on longest 
baselines maximize raw foreground reduction, they also 
introduce spectral ripples that are difficult to remove. 
Hence, we will find intermediate beam and baseline sizes 
to be optimal. 

As a first guess, one could imagine insertin g ou r expres¬ 
sion for 6 opt , Equation (16), into Equation (15) to yield 
an equation whose only tree parameter is the primary 
beam size 0$. Minimizing this equation by varying the 
beam size then suggests that the beam ought to be made 
as small as possible. However, since any discussion of an 











array’s absolute size will necessarily tie the array to abso¬ 
lute angular scales on the sky, a more nuanced discussion 
of foreground properties is required beyond the set-up in 
the previous subsection, which only required that the an¬ 
gular power spectrum of foregrounds was monotonically 
decreasing. 

One important property of the foreground sky is the 
fact that it is not rotationally invariant—the galactic 
plane, for example, is far brighter than the galactic poles. 
This is not captured by the angular power spectrum of 
foregrounds, which abuses the notion of a power spec¬ 
trum by assuming statistical isotropy for a sky that is 
clearly anisotropic. A global signal experiment with a 
narrow beam can take advantage of cooler regions in the 
galaxy, selectively observing only where foregrounds are 
known to be dimmer, since the cosmological global sig¬ 
nal is by definition the same everywhere on the sky. The 
narrower the primary beam, the more selectively one can 
implement such an angular foreground avoidance scheme, 
and the lower the foreground contamination. Arrays with 
narrow primary beams, large antennas, and long base¬ 
lines therefore see dimmer foregrounds for two reasons: 
the narrow primary beam allows for cleaner selections 
of cool patches of the sky, and the necessarily longer 
baselines also sample foregrounds on finer angular scales 
(higher £), which are weaker because Cg is a decreasing 
function for galactic foregrounds. 

Narrow primary beams, however, are not without their 
drawbacks. Angular avoidance strategies alone cannot 
mitigate foregrounds to the level required for a detec¬ 
tion of the cosmological global signal. An angular avoid¬ 
ance strategy in principle allows the rejection of any fore¬ 
grounds that are not spatially constant (i.e., are not the 
monopole), but are unable to remove the monopole com¬ 
ponent of foregrounds. Put another way, an observa¬ 
tional strategy that avoids the strongest foregrounds will 
reduce the magnitude of foreground contamination con¬ 
siderably, but will at best only be able to reduce the 
contamination to the minimum foreground temperature 
on the sky, which can still be much brighter than the cos¬ 
mological signal. Ultimately, one must therefore also rely 
on spectral foreground subtraction methods, and this is 
where narrow beams and long baselines may not be ad¬ 
vantageous. Spectral subtraction typically exploits the 
intrinsic smoothness of foreground spectra, projecting 
out smooth components of the data. For such a pro¬ 
cedure to be successful, one must avoid having an in¬ 
strument that imprints extra spectral features into the 
data. Long baselines are particularly prone to such im¬ 
prints, since the angular mode number £ ~ 27 til ~ 2nb/X 
probed by a baseline b varies more rapidly with frequency 
(or wavelength) when b is large, allowing non-uniform 
spatial features of the sky to couple more strongly into 
spectral ripplesj^] Such spectral ripples will survive a 


spectral foreground subt raction that 


modes (although see Switzer & Liu 
for how these ripples can be mode 


projects out smooth 


2014 for a proposal 
ed in situ from the 


data itself), leaving residuals that may be indistinguish¬ 
able from the cosmological signal. Combining this with 
our previous discussion, we see that an array with longer 
baselines and narrower primary beams may see dimmer 
foregrounds prior to spectral foreground subtraction, but 
may imprint spectral signatures that result in greater 
post-subtraction foreground residuals. An optimal array 
is one with a beam size that is chosen to balance these 
two competing demands. 

Because spatial features of the foreground sky such as 
the Galactic plane are difficult to model statistically, nu¬ 
merical simulations are required to find the right bal¬ 
ance in primary beam size. To perform such simula¬ 
tions, we first form simulated foreground skies between 
100 a nd 200 MHz by extrapolating the 408 MHz Haslam 
map (Haslam et al. 1982) pixel-by-pixel using a power- 
law-like relation 


408 MHz 




T(v) = THaslam(f) ( 

(17) 

where THasiam(r) represents the Haslam map and ao is 


held fixed at —2.5 (|Liu fc Tegmark 


2012 ), whereas oq, a 2 , 


and 0:3 are drawn pixel-by-pixel from zero-mean Gaus¬ 
sian distributions with standard deviations of 0.1, 0.03, 
and 0.01, respectively. Conceptually, higher curvature 
components to the spectrum are given less weight, in 


Oliveira-Costa et al. (12008 

and its mathematical inter- 

pretaflon in Liu & Tegmar 

(2012 

). It is important that 


our numericaT simulations are based on extrapolations 
with pixel-to-pixel variations. Th is will make the nu¬ 
merical explorations of Section |3.4| more realistic. There, 
we incorporate the Haslam map as a model of the fore¬ 
ground sky as part of our data analysis pipeline (in an 
attempt to suppress foregrounds), which in reality will 
differ somewhat from the true sky. Pixel-to-pixel varia¬ 
tions give rise to low-frequency maps that roughly look 
like the Haslam map, but with slight differences in their 
details, reflecting the imperfections of any sky template 
we may choose to use. 

We assume that observations are centered on the 
Northern Galactic Pole (NGP) with the extent of the 
field defined by the primary beam of the instrument. We 
take the primary beam to be a Gaussian attenuated by a 
cosine (to ensure that the primary beam vanishes at the 
horizon): 


A(9 , p) = exp 



(18) 


8 Indeed, this is a common concern for 21 cm tomography, and is 
the origin of the “for eground wedge” signature seen in power spec- 
trum measurements (Datta et al. 2010; Morales ct al. 2012 Trott 


et al. 2012; 

Vedantham et al. 

2012 

Parsons et ai.|2012 Thvagara- 


2 




jan et al. 

013 Pober et al. 

20T3 

Liu et al. 2014a b|. However, 


one key dmerence is that most interferometer arrays targeting the 
power spectrum are not tightly packed (though they do tend to 
be quite compact). Such arrays are therefore not subject to our 
constraint that the primary beam width and the baseline length 
vary in a strictly reciprocal fashion. One exception to this is the 
Hydrogen Epoch of Reionzation Array (HERA), which does have 


where 6 b controls the width of the primary beam. We 
assume (rather conservatively) that the beam width is 
proportional to the observation wavelength, and subse¬ 
quent quotations of 6b in this section refer to the beam 
width at the lowest frequency of observation. To measure 

close-packed elements ( [Pober et al.[ 2014| . 
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the global signal, we compute 


W = 


J2j [J A(r,u) exp (i2n^bj • r) dfl] V(bj,v) 


J2 k 1/ -^(r, v) exp (z27r^b fc • r) dfl\ 


(19) 


Although we defer a full discussion of data analysis to 
Section[3] we can understand the essential fea tures of this 
estimator as a generalization of Equation (10). First, this 


estimator does not require the flat-sky approximation. 
In addition, it incorporates a signal-to-noise weighting of 
measurements from different baselines. To see this, note 
that the term in the numerator enclosed by the square 
brackets is precisely the visibility response of a baseline 
to a monopole sky of unit amplitude. Baselines with 
a greater response are given greater weight as visibili¬ 
ties from different baselines are summed together, before 
normalizing the final result. If the array consists of a 
single baseline, the summations in both the numerator 
and denomin ator disappear, and the estimator reduces 
to Equation (10) once the flat-sky approximation is in¬ 
voked. 

Following an initial estimate of the sky spectrum, we 
further suppress foregrounds by fitting a polynomial to 
the logarithm of the spectrum. Subtracting off the 
smooth polynomial fit, one obtains a residual spectrum 


Tres(v) = To(y) - exp 


iVpdy 

E' 

n—0 


iPn (log 2 


( 20 ) 


where p n denotes the nth Legendre polynomial, with a 
corresponding expansion coefficient a n obtained from fit¬ 


ting logT 0 up to order -/V po i y . The set of polynomials 
that one fits to is arbitrary, and our choice of Legendre 
polynomials is simply one of convenience. In fact, there 


ternatives such 

as principal component analyses (Liu & 

Tegmark 2012 

Liu et al. 

2013 

Switzer & Liu 

2014) are 

worth exploring. Whatever foreground mod 

cl is ulti- 


mately used, one must simply take care to ensure that 
any possible loss of cosmological signal (resulting from 
the subtraction of spectral modes that the foregrounds 
and the signal have in common) is accurately quantified. 
We will do so in Section [|J when we propagate numerical 
simulations of fiducial instruments through to astrophys- 
ical parameters. 

In Figure [7] we show simulations of an interferomet¬ 


ric recovery of the global signal, T 0 (;y), averaged over 
10 , 000 random realizations of the simulated foregrounds. 
Because our simulations do not contain any cosmolog¬ 
ical signal, this is equal to A T 0 , the expected fore¬ 
ground bias. Each curve shows the result for a 6 x 6 
square grid of tightly packed antennas with varying 
primary beam widths (and therefore varying baseline 
lengths). We define a “tightly packed array” as one 
where the shortest baselines & s hort are given by & s hort /A ~ 
V8 ln2(2 v / 27r6»o)" 1 . This expression comes from tak¬ 
ing physical extent of a Gaussian aperture (which is 
ultimately just a theoretical construct) to be its full- 
width-half-max (FWHM) . As expected, arrays with 
smaller beams/longer baselines exhibit a lower fore¬ 
ground bias, since our observations are centered around 
the NGP, causing wider beams to pick up more fore¬ 


grounds from lower galactic latitudes, where they are 
typically brighter. 

Figures [8] and [9] show the foreground residuals that 
remain after the subtraction of 8th and 9th order log- 
space polynomials, respectively. One immediately sees 
that whereas the narrowest beams/longest baselines gave 
the dimmest initial pre-subtraction spectra, the post¬ 
subtraction residuals are minimized for intermediate¬ 
sized beams. This is precisely the trade-off that we qual¬ 
itatively alluded to above: the long baselines that in¬ 
evitably come with narrow beams cause low-level chro¬ 
matic ripples in the data that are not easily removed by 
smooth low-order polynomials, while the broad beams 
that come with short baselines incorporate brighter 
lower-latitude foregrounds. Further evidence can be seen 
by comparing Figures [8] and [9] One sees that increasing 
the order of the polynomial fit allows significant further 
suppression of the chromatic residuals introduced by long 
baselines, but only results in slight decreases in residu¬ 
als for the wide beam case. This is because the higher 
residuals for the latter are the result of an overall in¬ 
crease in foreground amplitude, which affects all poly¬ 
nomial orders. We find the optimal beam size of full- 
width-half-max ~ 40° to hold whether considering an 
experiment spanning the 100 to 200 MHz band (target¬ 
ing reionization) or the 50 to 100 MHz band (targeting 
the pre-reionization dip), provided the 40° recommen¬ 
dation refers to the beam size at the lowest observation 
frequency. 



Fig. 7.— The foreground bias ATo as a function of frequency v 
from simulations of an interferometric recovery of the global signal, 
averaged over 10, 000 random realizations of the simulated fore¬ 
grounds. The data is shown for three different 6x6 square grid 
arrays of tightly-packed antennae. The blue, orange, and green ar¬ 
rays are composed of antennae with full-width-half-max (FWHM) 
beam sizes of 13.4°, 38.7°, and 60° at 100 MHz, respectively. Note 
that narrower beams have less pre-subtraction foreground bias, as 
expected. 


Yet another consideration in choosing baseline length 
is instrumental noise. As we have already alluded to, 
short baselines have greater sensitivity to the spatial 
nronopole of the sky, and therefore have better signal-to- 
noise. Thus, if one were to add instrumental noise to the 
preceding discussion, the optimal baseline length would 
shift towards smaller values (with the primary beam cor¬ 
respondingly increasing in width). Picking a baseline 
length based on the foregrounds-only analysis of this sub- 
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Fig. 8.— The foreground residuals after subtracting off a 8th 
degree polynomial in log space. The data are shown for the same 
arrays as from Figure [7] Note that unlike in the case of the raw 
foreground bias, the array with the narrowest beams no longer has 
the least foreground contamination. Instead, the intermediate-size 
beam results in the lowest foreground residuals. This is due to the 
fact that narrower beams come with longer baselines that introduce 
chromatic ripples that are difficult to subtract. 



section is therefore in principle sub-optimal. In practice, 
however, instrumental noise is a sub-dominant contri¬ 
bution to the error budgets of most 21 cm global signal 
experiments, and the optimal baseline length will only 
be slightly shorter than the one advocated here. More¬ 
over, since interferometers consist of many baselines, the 
collective signal-to-noise of an array can compensate for 
a lower signal-to-noise in any individual baseline, a point 
that we will explore in the following subsection. 

2.4. How many elements? 

Thus far, we have established that the ideal global 
signal interferometer is one comprised of elements with 
FWHM primary beam widths of ~ 40°, packed as closely 
together as possible. However, we have yet to specify the 
number of elements. In this section we imagine a regular 
square grid of N x N antennas and ask what value of N 
is required for an interferometer to perform as well as a 
single element in a measurement of the global signal. 

As one adds more and more elements to a regular array 


(increasing iV), the main effect is an increase in the num¬ 
ber of short baselines, providing repeated measurements 
of the same visibilities that can be combined to average 
down instrumental noise. While it is true that adding 
more elements to an array also gives rise to some longer 
baselines (since the only way to add antenna elements to 
a closely packed array is to add them to the periphery), 
these baselines have minimal response to the global sig¬ 
nal and only provide information regarding foregrounds. 
This information can in principle be used to help with 
foreground mitigation, but as we shall see when we dis¬ 
cuss data analysis in Section[3j it is difficult to use this in¬ 
formation without introducing chromatic signatures into 
the final global spectrum estimates. It is thus safer to 
severely downweight the influence of long baselines, min¬ 
imizing their influence on the final result. 

With multiple copies of the same baselines, an interfer¬ 
ometer can have just as high a signal-to-noise as single el¬ 
ement experiment, even if each individual baseline is less 
sensitive to the global signal. To quantify precisely how 
many such copies are necessary, we will now compute the 


expected noise variance from our estim ator Tq{v) of the 
global signal. Starting with Equation (191, we perturb 
the jth visibility V (b,, v) by adding an additive noise 
contribution n.j{v). Computing the variance II(iz, u) of 
the final estimator then gives 


n(^)E(T 0 H 2 )-(T „(,)) 2 


J2 k | / A(r, v) exp (i 27 r(jb fc • r) dCl\ 


( 21 ) 


where we have assumed that the instrumental noise has 
zero mean, so that (rij) = 0. We have additionally as¬ 
sumed that the noise is uncorrelated between baselines 
with variance cr 2 , so that (rqn*) = <J 2 5ij. Making the 
approximation that the sensitivity of the array to the 
global signal is dominated by the shortest baseline b s h or t 
of which there are A s hort copies, the noise variance of an 
interferometer-estimated global signal reduces to 


Ashort | / A(r, v) exp (i27r|b short ■ r) dfl\ 

( 22 ) 

On the other hand, for a single element experiment we 
have only a single baseline of length zero, so the noise 
variance is 


n(i/, v) 


2 a 2 

f A( r, u)dCl 


(23) 


where the extra factor of 2 arises from the fact that out¬ 
put voltages are squared in auto-correlation experiments, 
resulting in a squaring of the Gaussian noise contribu¬ 
tion. The variance of the squared noise then depends 
on the fourth moment of a Gaussian distribution, which 
gives an extra factor of 2 when expressed in terms of the 
variance of the signal, a 2 . Equating our last two expres¬ 
sion allows one to solve for the number of short baselines 
A s hort that are needed for an interferometer to have the 
same thermal noise sensitivity as a single element exper¬ 
iment: 

Ar _ |/A(f,z/)dH | 2 

-L' short - . \ 1 2 

21 / A(r, v) exp b short • ?) dfl \ 


(24) 
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To get a rough sense for the magnitude of Whort, suppose 
we make the flat-sky approximation. This gives 


N, 


short 


|^( 0 )| 


21 A (Z^bshort /c) | 

For Gaussian beams, we obtain 

iV short « iexp^TrX^rt 


(25) 


(26) 


Here, we see that N s h or t scales very strongly with base¬ 
line length. While this is due in part to our use of a 
Gaussian beam, the strong scaling is fundamentally due 
to the fact that sensitivity to the global signal drops 
rather rapidly with increasing baseline length. Again 
defining a closely packed array to be one with b s h or t/A ~ 
V8 In 2(2v / 2'7r0 o ) -1 , our expression gives iV s hort = 8. For 
an N x N square grid of antennasF] there are lV s hort = 
2 (IV — l) 2 shortest baselines formed by adjacent antenna 
pairs (half of which are in one direction, while the other 
half are perpendicular). Solving for N then gives N = 3, 
and therefore even a small array will allow an interfero¬ 
metric measurement to more than make up for the loss 
of sensitivity to the global signal from discarding auto¬ 
correlations. 

Admittedly, the calculation that we have just presented 
is rather sensitive to the details of one’s antenna pattern, 
and the final result of IV = 3 is particular to our Gaussian 
model. In practice, antennas to be used for a global 
signal interferometer ought to be carefully designed to 
ensure that the drop-off in sensitivity to the monopole 
is not too rapid. Fortunately, existing antenna designs 
already have a reasonable performance as far as thermal 
noise is concerned. The 150 MHz PAPER beam model 
used to generate Figure|6j for example, gives IV s h or t ~ 32 
{using the full curved sky expression given in Equation 
24) when the 2 m-sized PAPER antennas are placed next 
to each other. This requires a 5 x 5 array, larger than 
our previous estimate but certainly not an unreasonably 
large number. 


3. DATA ANALYSIS CHOICES 

In the previous section, we examined the trade-offs as¬ 
sociated with the design of an interferometer targeting 
the global 21 cm signal. Qualitatively, we concluded that 
one ought to design an interferometer array with a mod¬ 
est number of antenna elements, packed as closely to¬ 
gether as possible. Ideally, the antenna elements should 
possess primary beams that are neither too narrow nor 
too broad, with roughly a full-width-half-maximum of 
~ 40° at the lowest observation frequency. 

In this section, we assume that an appropriate array 
has been constructed, and consider instead various trade- 


9 Our translation from N s ^ OTt to N makes the crucial assumption 
that independent baselines have independent instrumental noise 
contributions. Recent calculations (A. Neben, private communica¬ 
tion) have suggested that this may not be a good approximation, 
particularly for tightly packed arrays that are designed to be sen¬ 
sitive to the global signal. However, since the number of anten¬ 
nas required is reasonably small, a straightforward solution to this 
problem is to simply construct a large number of two-element mini¬ 
interferometers rather than a larger single interferometer, with the 
mini-interferometers placed far away from each other to reduce 
correlated noise effects. 


offs in data a n alysis. Inspired by the near-separable mod¬ 
els of Switzer & Liu (2014), our proposed analysis meth¬ 
ods will usually involve a two-step process. First, the 
data are analyzed frequency-by-frequency, producing an 
estimate of the spatial monopole at every frequency chan¬ 
nel. Following the per-frequency analysis, the data are 
combined into a single global signal spectrum, where we 
take advantage of the long frequency-coherence length of 
foregrounds to perform a final foreground subtraction. 
In what follows, we will examine the pros and cons of 
various choices in the detailed implementation of each of 
these steps. 


3.1. Step 1: Extracting the Spatial Monopole from 
Visibilities 

We begin with a more general version of Equation Q, 
our measurement equation. Discarding the flat-sky ap¬ 
proximation, we have 

V(b) = J T(r)A(r) exp *27r—b • (27) 

where r is a unit vector that specifies locations on the 
sky. Expressing the sky in terms of spherical harmonics 
gives 

U(b) = E ( / Yem(i)A(r)e- i2 ^ bi dn\a, m . (28) 

im ' 


Since this equation is linear, we may write it as a matrix 
equation of the form 


y = Qx 


(29) 


where y is a vector of length TVbi (the number of base¬ 
lines) containing the visibilities measured at different 
baselines U(b), and where we have added an instrumen¬ 
tal noise contribution n. The matrix Q is the beam re¬ 
sponse of an antenna array at different baselines (rows) 
and differe nt sp heric al h armonics (columns). Comparing 
Equations (281 and (29), we see that the explicit form of 


Q is given by 


Q j,tm = / dClA(r)Y em (r)e~ 


(30) 


where A is the primary beam for the antennas, b j is the 
jth baseline, and A is the wavelength of observation. The 
sky is represented by x, which is a vector containing all 
the spherical harmonic coefficients at m . As such, it has 
length (£ max + l) 2 , where £ max is the largest £ value used 
in the model of the true sky. The global signal that we 
seek is proportional^ to the first component of x, i.e., 
aoo- Ultimately, then, we only need to form an estima¬ 
tor ago of this first component. However, it is crucial to 
bear in mind that the true x contains foregrounds with 
significant power in higher (£, m) modes, and that this 
power may leak into our estimate doo of aoo- The formal¬ 
ism that we present below will provide exactly the right 
machinery for quantifying such leakage. 


10 Throughout this paper, we adopt a spherical harmonic nor¬ 
malization convention where V’oo(f) = l/Vizr. A pure monopole 
To then has aoo = f YooTodCl = \/ 4ttTo. so to recover an estimate 
of Tq from an estimate of aoo, one must divide by yDr. 
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Since Q is not in general inverti ble (or even a square 
matrix), we cannot solve Equation (29) for x directly and 


instead can only recover an estimator for x. We consider 
an estimator of the form 


x = MQ f Hy 


(31) 


where H and M are both matrices that can be chosen by 
the data analyst. The H matrix is of size Wi x -Ny, and 
its role is to encode how the data analyst might wish to 
weight the different visibilities in forming estimates of the 
various spherical harmonics. Note that this weighting is 
in addition to the weighting that is naturally provided 
by Qf, which (for a given spherical harmonic mode) up¬ 
weights baselines if their visibilities are sensitive to that 
mode and downweights them otherwise. The matrix M 
measures fy max + l) 2 x (f m ax + l) 2 , and therefore oper¬ 
ates on a set of estimated spherical harmonic modes. Its 
role is to normalize our estimates of the different modes, 
and to possibly disentangle them from each other if there 
is leaked power between the modes. Our choices for M 
and H will determine the statistical properties of our fi¬ 
nal global signal estimates. For example, the variance £ 
of our estimator (the square root of which gives the error 
bars from instrumental noise) is given by 

£ = (xx t ) - (x)(x) t = MC^HNHQM, (32) 


where N = (yyt) — (y)(y)' is the instrumental noise 
covariance of the visibilities. This is clearly affected by 
our choices for H and M. 


3.2. A possible choice for M 
As a first guess, picking 


yet another linear combination, this time of different vis¬ 
ibilities. The goal of this linear combination is to invert 
the original linear combination of spherical harmonics 
that was formed by the instrument. Clearly, a necessary 
condition for this inversion to be successful is for there to 
be at least as many constraints (i.e., unique visibilities) 
as there are spherical harmonic coefficients to estimate. 
Ideally, an array ought to make many independent mea¬ 
surements per spherical harmonic mode to ensure a clean 
separation of modes. Since £ ~ 2nu, different £ modes 
are separated by A£ ~ 2ttAu. Given that £ can only 
take on integer values, this means that having enough 
measurements is tantamount to requiring our baselines 
be separated from each other by less than Am = 1 /27r on 
the uv plane. As a concrete example, imagine the square 
grid of antennas from the previous section, where neigh¬ 
boring antennas separated by a distance 6 s h ort • Baselines 
of this array will also form a square grid of points on the 
uv plane with the u and v coordinates given by integer 
multiples of Am = fyhort/A = & s hort*V c - Therefore, in or¬ 
der to have enough measurements for inversion, we must 
satisfy the condition 


fyhort 


2'Kl'r, 


(34) 


where we have evaluated our constraint at the maximum 
frequency v max we wish to probe, since that is where it is 
the most stringent. On the other hand, as we have argued 
above, physical constraints on antennas size dictate a 
spacing satisfying^]] 


fyhort A 


27T0f ) r' m i n 


(35) 


M = [Q f HQ] 


-l 


(33) 


might be considered an attractive choice. The final esti¬ 
mator has the desirable property that its ensemble aver¬ 
age (x) satisfies the condition (x) = x. This means that 
on average, the estimator for a particular spherical har¬ 
monic coefficient at m is equal to the true coefficient at m , 
and there is no leakage between different spherical har¬ 
monic modes. An estimate of the monopole is thus truly 
an estimate of the monopole only, with no contributions 
from other modes. 


However, the M matrix defined by Equation (331 


makes an assumption that may not be justified—it as¬ 
sumes that the combination [Q f HQ] is invertible. Es¬ 
sentially, since the inversion results in (x) = x, we can 
reverse our line of reasoning to see that any time our 
observations do not allow different at m to be perfectly 
disentangled from each other, Q^HQ will be uninvert- 
ible. For example, if only part of the sky is surveyed, the 
best that one can do is to measure linear combinations of 
different ag m s. Even if the full sky is surveyed (for exam¬ 
ple by observing the sky with a hypothetical wide-held 
instrument located at the equator), it is typically diffi¬ 
cult to design a broadband instrument that allows for a 
full inversion without sacrificing the design principles of 
the previous section, as we will now show. 

To perfectly isolate a given spherical harmonic coeffi¬ 
cient atm , it is necessary to incorporate information from 
multiple baselines, each of which measures a slightly dif¬ 
ferent linear combination of spherical harmonics on the 
sky. A clean extraction requires the data analyst to form 


where this time the tightest constraint occurs at the low¬ 
est frequency ^ m i n - 

The two constraints listed above make it difficult to 
probe a large frequency range with a single interferome¬ 
ter. To see this, note that the upper limit on bo decreases 
with increasing z/ max , while the lower limit increases with 
decreasing v m i n . With a wide enough frequency range, 
these two limits meet, and to avoid inconsistent con¬ 
straints, we require 


e b > — (36) 

^min 

as a minimum beam size. Since this critical beam size 
depends on the ratio of i/ max to i/ m i n , it is easier to sat¬ 
isfy our bounds with a narrowband instrument at higher 
frequencies, which is precisely the scenario that is unin¬ 
teresting for a global 21 cm signal experiment. Moreover, 
since z'max must be greater than i/ m j n the generosity of our 
bound saturates at ^ max = Gnim and our condition then 
requires that do > 1 rad. Recalling that 6 b is the standard 
deviation of a Gaussian beam (and not the FWHM), we 
see that essentially one needs horizon-to-horizon beam 

11 Note that strictly speaking, this constraint only applies to the 
shortest baselines, since it is possible to obtain sub-element sized 
spacings of longer baselines by slightly dithering the positions of 
antennas in a large array. For now we will disregard this inter¬ 
esting point because it is the shortest baselines that provide the 
greatest access to the global signal. However, in future work it 
may be possible to use sub-element dithering to cleanly measure 
high l modes of the sky, providing extra data-derived information 
on foregrounds. 
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coverage of the sky. As discussed previously, however, 
such a beam would be too wide to be optimal, as it 
would not allow the selective isolation of cold patches 
in the galaxy. Indeed, one can see from Figures [8] and [9] 
that with a FWHM of 60°, foreground residuals decrease 


rather slowly. 

From this, we see that if one is to adhere to the design 
principles of Section[2j the conditions required for Qf HQ 
to be invertible will necessarily be violated at some ob¬ 
servation frequencies. This problem can be alleviated 
slightly if one is willing to construct multiple narrow- 
band arrays to collectively cover a wide frequency range. 
However, this is not only an expensive solution, but also 
a relatively ineffective one—the best that one can do is 
to pursue an extreme approach where a different array is 
constructed (or a single array reconfigured) for every ob¬ 
servation frequency, but that corresponds precisely to the 
i/max = z'min case discussed above, and we have already 
seen that the required beam sizes are still too wide. 

Alternatively, one may simply replace the inverse of 
Q f HQ with its pseudo-inverse whenever the matrix 
Q f HQ is uninvertible. In doing so, however, one runs the 
risk of imprinting sharp spectral features into the final 
estimate of the global signal. This is because a pseudo¬ 
inverse inverts only the modes of a matrix that are 
present with non-zero eigenvalue, and the set of modes 
that are present will in general be frequency-dependent. 
The imprint of sharp spectral features should be avoided 
at all costs, since sharp features have the ability to mas¬ 
querade as the cosmological signal. 

For all the reasons listed above, we therefore recom¬ 
mend against the use of Equation (33) for M. 


3.3. A better choice for M 


As an alternative to Equation (33), consider the diag¬ 
onal matrix given by 


M- ■ = _ - _ 

13 (QlHQ)j 


(37) 


where 6 l: j is the Kronecker delta. With this choice of M, 
each ai m estimate (each component of x) is a linear com¬ 
bination of the true a^ m coefficients. At first sight, this 
seems to be a drawback of this choice for M, since our 
goal is to measure the cosmological monopole. However, 
we will soon use a variety of illustrative special cases to 
see that this is not so, and that some leakage between dif¬ 
ferent aim coefficients—if appropriately constrained—is 
in fact a feature. To quantify this leakage, we take the 
ense mble average of Equation (31) and insert Equation 
(29). This yields 


(x) = MQ'HQx 


(38) 


Defining a window function matrix W as 

W = MQ^HQ, (39) 

we see that (x) = Wx, so each row of W gives the linear 
combination of spherical harmonics actually probed by 
our estimator &i m . The matrix W quantifies the amount 
of leakage between modes, and it depends on both our 
instrument (vi a Q) and our data analysis method, via 
Equations ( [3T| ) and ( |37| ). Of particular interest will be 
the first row of W, which will tell us what combination 


of spherical harmonics are actually being measured when 
we attempt to constrain the monopole a oo mode. 

The choice of M used here has several attractive prop¬ 
erties. In Appendix [A] we prove that if H is used as 
an inverse noise covariance weighting of visibilities (i.e., 
if we have H = N -1 ), then our diagonal choice for M 
minimizes the variance (an d th erefore the error bars) on 
x. Additionally, Equation (l37| has the property that di¬ 
agonal elements of W always equal unity, regardless of 
what H is used. Focusing on the first row then, we have 
Wn = 1, which implies that the amplitude of a pure 
monopole sky is preserved by our measurement and data 
analysis procedures. In other words, there is by con¬ 
struction never any signal loss in this stage of the analy¬ 
sis, where we combine visibilities into spherical harmonic 
coefficients. 


3.3.1. Single Element Limit 

Consider the single-element limit as an illustrative ex¬ 
ample of how our choice of M works and how leakage 
between spherical harmonic modes can be a desirable fea¬ 
ture. The single-element limit is representative of auto¬ 
correlation experiments such as EDGES. With a single 
antenna element, the matrix Q re duces to a single row 
vector consisting of Equation (301 evaluated at baseline 
length bj = 0. The measurement vector y becomes a 
single measurement y 1 given by the primary beam inte¬ 
grated over the sky: 


V = 


T{r)A(r)dn. 


Equations (31) and (|37|) then reduce to 


_ f dnYi m (r)A(r) 
[JdDA(f )] 2 


(40) 

(41) 


For the global signal (i.e., spatial monopole), we are in¬ 
terested in the first component of this x vector. Isolating 
this and dividing both sides y/in to convert from a oo to 
the spatial mean of the sky, we obtain 


T 0 = 


y 

f dflA(r) 


(42) 


which is the estimator that one would have guessed 
from simple considerations—the measurement y is just 
a weighted average of the sky, and the denominator nor¬ 
malizes the weights. This is in fact also a special case of 
the estimator given in Equation (10), where the baseline 
vector b is set to zero. Note that there was no need to 
specify the matrix H because, with only a single mea¬ 
surement from a single element, H r edu ces to a single 
scalar. The copy of H in Equation (l3lj ) will therefore 
always cancel the copy in Equatio n (3<1 ). 

Explicitly evaluating Equation (39) ror our single el¬ 
ement case, the window function for the sky monopole 
(i.e., the first row of W) is given by 


W 0 (£,m) 


J dflA(r)Yi m (r) 
f dflA(r)/y/47r 


(43) 


Naively, one might have hoped for Wq to be zero for 
all values of £ and rn except for (£,m) = (0,0), so that 
our estimator for the sky monopole does not contain any 
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leaked power from other sphe rical harmonics. This is 
what the M matrix of Section 13.21 would have achieved 
and is what was sacrificed by our new choice of M. On 
closer examination, however, the leakage appears to be 
rather innocuous and is simply the result of our having 
only surveyed a small portion of the sky (the part that 
lies within the primary beam). Confirming this inter¬ 
pretation is the form of the numerator in Equation (43), 


which is the spherical harmonic decomposition of the pri¬ 
mary beam. 

As discussed above in Section [2] it is advantageous to 
concentrate observations on cooler parts of the sky. The 
non -zero width of the window function given by Equation 
(43) is thus a desirable feature. Another way to see this 


is to recognize that focusing on a small patch of the sky 
(and therefore accepting our broader window function) 
gives us a better estimate of the cosmological monopole 
signal than if we were somehow able to force the window 
function to be zero away from (£, to) = (0,0). In the 
latter case we would be better estimating the monopole 
signal of total sky emission, but much of this would be 
due to the foreground contribution. The key point is 
that the foregrounds also have a monopole, and without 
an a priori way to distinguish between the monopole of 
the cosmological and the monopole of the foregrounds, 
making a “clean” measurement of the sky’s monopole 
simply adds stronger foregrounds. 

3.3.2. Interferometric case 

We now turn to the interferometric, multi-baseline 
case. With our measurement y now consisting of more 
than just a single number, there is the opportunity to 
weight our data in non-trivial ways. Put another way, it 
will be necessary to decide on a form for H, since its two 
copies will no longer cancel when forming x. 

Consider an inverse noise covariance weighting of H = 
N _1 . With the assumption that instrumental noise is 
uncorrelated and uniform across different baselines, this 
is equivalent to H = I. The window function matrix 
then simplifies to Wa MQlQ, with the key piece being 
QtQ. The M matrix is irrelevant to any discussion of 
leakage between different sph erical harmonics, since the 
form given by Equation (37) is diagonal, and thus the 


matrix only provides a normalization for each spherical 
harmonic without further mixing between modes. Eval¬ 
uating the window function matrix explicitly, we obtain 


tm,t'm' (Q^Q) 


tm.t'm' 


(/ dnA ^) Y L^y 


x ( / dn'A(r')Y rm '(r)e~ 


.(44) 


For the purposes of measuring the global signal, it is 
again the first row of this matrix that is the most rele¬ 
vant. Making the flat-sky approximation for the sake of 
intuition yields 

W 0 (u)«^I*^jl(u-^. (45) 

Now, A is a function that peaks at the origin and drops 
off on a characteristic scale (27r$£,) _1 . Thus, this expres¬ 


sion tells us that the highest u = |u| scale that is probed 
by our global signal interferometer is 


&max 1 _ Ny/2 + 1 

~T~ + 2^e~ b ~ 27 x9 b 


(46) 


where 5 max is the longest baseline in the array, and in the 
last equality we assumed a closel y pa cked N x N square 
array, just as we did in Section |2.4[ The reciprocal of 
this expression gives the finest angular scale 6^ ne that 
our interferometer is sensitive to: 


dfine 


/ F WHM \ / Ny/2 + l\ 

V 40° J \ 5v^+l ) 


(47) 


Here, we have eliminated 9 b (which corresponds to the 
standard deviation for a Gaussian beam) in favor of the 
FWHM, and have used a fiducial array size of N = 5 to 
be slightly on the conse rvati ve side of our optimal N = 2 
or greater from Section [2~4l We may thus conclude that 
an interferometer that is designed in accordance with the 
principles laid out in Section [2] will not be sensitive to 
scales finer than ~ 10°. Since the anisotropies of the 
cosmological 21 cm sig nal are negligible beyo nd an angu¬ 
lar scale of ~ 1° to 2° ( Bittner fc Loeb|2011 1 , the modes 
that are measured by our interferometer are essentially 
global signal modes, even if they are not formally the 
u = 0 (or I = to = 0) mode. Indeed, this argument is 
one that is implicitly invoked by most theoretical sim¬ 
ulations of the global signal—since full sky simulations 
are too computationally expensive to perform, most (if 
not all) simulations simply average over an angular field 
of view that is much greater than the angular scale of 
anisotropies, and declare the result the global signal. 

In short, the leakage of higher spherical harmonic 
modes into our estimate of the global signal is likely not 
a concern. In fact, our estimate is a conservative one, be¬ 
cause we assumed that the longest baseline of an array 
contributes significantly to the estimator of the monopole 
mode. In practice, long baselines have so little response 
to the monopole that its contribution to our estimator is 
heavily downweighted by the presence of Ql. This man¬ 
ifests itself as the A*(bfc/A) term in our approximate 
window function, Equation ( |45[ ). The window function 
for the monopole is therefore preferentially dominated by 
the baselines that probe broad angular scales. 

Importantly, however, our argument relied on the fall- 
off of A. If the primary beam contains fine features, A 
becomes a rather broad function, and one’s interferome¬ 
ter begins to be sensitive to the more substantial small- 
scale modes of the 21 cm anisotropies. This may be an 
important effect for the global signal measurements per¬ 
formed at LOFAR using lunar occultations, which_ im- 


prints sm all spatial structures in the beam (Vedantham 
et al.|2015). Fortunately, our formalism provides an easy 
way to compute the relevant window functions to assess 
the viability of occultation measurements. 


3.4. Choices for H 

Having motivated our diagonal choice for M, we now 
consider our choice for H, which weights the visibilities. 
We will find that although the presence of H in our esti¬ 
mator provides additional flexibility that can in principle 
be harnessed to better suppress foregrounds, a “simple 
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is best” approach of setting H = I is more robust and 
gives better final results. 

To see how H can aid in foreground mitigation, imag¬ 
ine we were able to accurately model the full covariance 
matrix of foregrounds Nf g . Picking H = NZ 1 in Equa¬ 


tion (31) would then downweight select modes in the vis¬ 
ibilities y that are particularly foreground-contaminated, 
before the Qf matrix converts the collection of visibilities 
into estimates of spherical harmonic estimates (albeit un¬ 
normalized ones, prior to the action of M). In practice, 
one does not possess sufficient information to construct 
a full covariance matrix, and approximations must be 
made. 


of the sky will end up possessing correlated phases. Since 
the visibilities are approximately measurements of the 
harmonic sky, they too will be correlated. These correla¬ 
tions can then be used to downweight the contributions 
of brighter parts of the foreground sky. 

Concretely, suppose we form an image-space covari¬ 
ance matrix R, which is given by 

Rjj = m 2 (r i )5 ij (48) 

where m 2 (f\) is the model for the foreground sky bright¬ 
ness at the ?’th pixel with angular position rj . To use this 
in our estimator requires translating this matrix into vis¬ 
ibility space by constructing 


3.4.1. Approximating H = N^ 1 as a diagonal matrix in 

harmonic space 

Suppose that one were to approximate the foreground 
covariance matrix as diagonal in some judiciously chosen 
basis. Of course, any matrix is by construction diagonal 
in its own eigenbasis. However, moving into this basis 
would require knowing the exact covariance matrix to 
begin with. Instead, our goal should be to find some ba¬ 
sis that sufficiently captures the features of the matrix 
that are needed for foreground mitigation. Consider, for 
example, a matrix that is diagonal in harmonic space. 
Modeling the foreground sky in such a way is tantamount 
to saying that the foregrounds are statistically isotropic, 
and therefore describable using a power spectrum. While 
this may be sufficient for some applications, in our case 
it is relatively unhelpful. To see this, consider the action 
of an interferometer in the flat-sky approximation. Dis¬ 
regarding the primary beam for a moment, each baseline 
would simply measure a different Fourier mode of the 
sky. In the full formalism, Q maps spherical harmonics 
to visibilities; in the flat-sky approximation, the spherical 
harmonics and visibilities both reduce to Fourier modes, 
so Q correspondingly reduces to I. The M matrix in our 
estimator therefore simplifies to H _1 , which then acts 
directly on our H weighting of the visibilities (since Qf 
is now the identity). The two copies of H then cancel 
each other, and the estimator becomes x = y. Concep¬ 
tually, the visibilities are already a measurement of the 
harmonics of the sky, and thus the different visibilities 
never have to mix with each other to produce our final 
(harmonic) estimator. Any downweighting of a strong 
foreground mode in harmonic space is then simply up¬ 
weighted back to its original strength, and no foreground 
mitigation happens. Of course, in a realistic situation 
we violate the assumptions of the flat-sky and a uniform 
primary beam, but the foreground suppression effects are 
still likely to be minimal. 

3.4.2. Approximating H = N ^ 1 as a diagonal matrix in 

image space versus setting H = I 

In contrast, consider a foreground covariance matrix 
that is diagonal—but not the identity- in image space. 
When transformed into visibility space, the Nf g ma¬ 
trix then contains off-diagonal elements. In acting on 
y through H = N ^ 1 , different visibilities are then mixed 
together in an effort to suppress foregrounds. Physically, 
this corresponds to the statement that if the foregrounds 
are not statistically isotropic (so that the diagonal ele¬ 
ments of the covariance vary), the harmonic coefficients 


N fg = GRG t , 


where 


G ij = A(fj) exp ( — ? 27 t ^ • Yj ) A Cl, 


(49) 

(50) 


with A being the solid angle encompassed by each pixel 
of our sky model. Note that in general Nf g is a smaller 
matrix than R, since the former measures Ni,\ x Ayi while 
the latter measures N p i x x AT p i x , where N plx is the number 
of pixels in our model. 

In Figure [lOj we compare the global signal estima¬ 
tors that result from analyzing simulated data using 
H = N ^ 1 to those that are obtained with H = I. For the 


foreground model in Equation (48 1 , we use the Haslam 


map at 408 MHz. Note that even though the Haslam 
map will have the wrong amplitude for observations in 
our frequency band, the overall amplitude of our fore¬ 
ground map will always cancel out i n ou r final estimator, 
since H appears both in Equation (31) and in our nor¬ 


malization M. In other words, only the angular shape 
of the foreground sky ma tters. The instrumental simula¬ 
tions used for Figure [T0| are identical to those of Section 
|2.3[ except with the FWHM of the primary beams set 
to 88 °. With a broader beam, one exaggerates the ef¬ 
fect of setting H = N^ 1 . This is because a larger field 
of view captures more of the strongly anisotropic nature 
of foregrounds, which makes a selective downweighting 
of the brighter parts of the sky more important. Figure 
[ 10 ] shows that such a downweighting does in fact reduce 
the brightness of the foregrounds, but only by a small 
amount. Part of this is because of the small size of our 
array, which limits the number of long baselines that are 
available for resolving fine spatial structures on the sky. 
However, incorporating longer baselines into our mea¬ 
surement quickly runs afoul of the constraint imposed 
by Equation (47). 


The reduction in foreground contamination is more 
pronounced if observations are centered on the galactic 
plane. There, the foregrounds vary strongly with galac¬ 
tic latitude, making downweightings much more impor¬ 
tant, and reductions of up to a factor of ~ 2 are possi¬ 
ble. In practice, however, one tends to avoid observing 
in the galactic plane anyway. Therefore, the benefits of 
H = Nfg 1 are likely to be minimal, particularly when one 
moves back to using the narrower (but not too narrow) 
beams suggested in Section |2.3[ which can more easily 
isolate patches of the sky that are more approximately 
isotropic. 
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Fig. 10.— The foreground bias ATo as a function of frequency, 
comparing the data analysis method that weights visibilities by the 
inverse of the foreground amplitude (the solid blue line) with one 
that weights visibilities uniformly (the dashed orange line). The 
experimental setup consists of a closely-packed 6x6 array of an¬ 
tennas with a primary beam of 88°, with observations centered on 
the NGP. Adding the foreground-motivated weighting does reduce 
the foreground bias, but not in any significant way. 



Fig. 11.— Foreground residuals from Figure [To] after a 4th or¬ 
der logarithmic polynomial subtraction. The uniformly weighted 
residuals are now seen to be weaker than those of the inverse fore¬ 
ground weighted analysis. Inverse foreground weighting is thus 
seen to imprint extra chromaticity in measured spectra, making 
spectral foreground subtraction more difficult. 


lar resolution is necessary, which means that the estima¬ 
tor must weight longer -and more chromatic—baselines 
more heavily. Long baselines also have the disadvantage 
of having low sensitivity to the monopole. With a heav¬ 
ier weighting of long baselines, it is more difficult for an 
interferometer to match the thermal noise sensitivity of 
a single element experiment. Although in future work 
it may be possible to eliminate all of these issues with 
a more sophisticated foreground-motivated form of Nf g , 
for now we propose the use of H = I to be conservative. 


3.5. Step 2: Fitting smooth foregrounds 

After the frequency-by-frequency combination of vis¬ 
ibilities into an initial estimate of the monopole mode, 
one obtains spectra such as those shown in Figure [7] and 
Figure [lOj The spectra are clearly still dominated by 
smooth, near-power law foregrounds. It is therefore nec¬ 
essary to take further steps to mitigate foregrounds once 
the data have been reduced to a single spectrum. 

One approach is to subtract off smooth functions from 
the spectrum, be they polynomials or more data-driven 
forms (such as principal component spectra used in 
Switzer & Liu 2014). The (hopefully foreground-free) 
residuals can then be compared to theoretical models for 
the global signal, although care must be taken to prop¬ 
erly account for the possibility that part of the cosmo¬ 
logical signal may have been subtracted along with the 
foregrounds. An alternate approach, which we adopt in 
the rest of this paper, is to follow in the footsteps of 
Pritchard & Loeb| (20101; Harker et al. (20121; Bernard!] 
et al.| (|2UI4p and fit tor foreground and cosmological 
model parameters simultaneously. Doing so provides a 
natural description for signal loss, which manifests itself 
as degeneracies between foreground parameters and cos¬ 
mological model parameters. One important trade-off is 
to decide how many foreground parameters to include. 
If too many parameters are used, much of the cosmolog¬ 
ical signal will be absorbed into the foreground model, 
resulting in large degeneracies and large final error bars 
on the parameters. On the other hand, having too few 
parameters will result in foreground residuals that will 
bias cosmol ogical parameter valu es. We take the same 
approach as Bernardi et al. (2014), where we include just 
enough foreground parameters for the cosmological pa¬ 
rameter bias to be subdominant to the errors. 


More seriously, setting H = N f 1 has the potential to 
introduce harmful artifacts into the final spectra. As one 
begins to subtract smooth components from the spectra, 
the residuals from the uniformly weighted analysis often 
become smaller than those from the f oreg round-weighted 
analysis. This can be seen in Figure [lT] where we show 
the residuals after a 4th order log-polynomial subtrac¬ 
tion. This is the result of the foreground-weighted anal¬ 
ysis introducing unsmooth structures into the intrinsi¬ 
cally smooth spectrum. To be fair, it is possible to find 
combinations of polynomial orders and instrumental pa¬ 
rameters where the foreground-weighted method outper¬ 
forms uniform weighting. However, it is certainly not 
uncommon for H = NjT 1 to perform worse. Much of 
this is because the quest to downweight brighter por¬ 
tions of the foreground sky requires the isolation of spa¬ 
tially small patches. To isolate these patches, high angu- 


3.6. Summary of data analysis methods 

In summary, we propose a “simple is best” approach 
for extracting the global signal from interferometric data. 
In w hat follows, we will set H = I and ado pt E quation 
([37| for M. Plugging these into Equation ( |31[ ), recast¬ 
ing our vector/matrix expressions in terms of continuous 
functions, and once again dividing by \J~Fk to convert 
from an estimate of ago to the global signal, we obtain 


Ej [f dflA(r,f) exp (i2n^bj ■ ?)] V(bj,v) 
Ek 1 1 dflA(r, v) exp (i27r(jb k • r) f 


(51) 

where we have re-introduced the frequency dependence 
of various quantities in our notation. Essentially, our 
estimator amounts to performing a linear fit (frequency- 
by-frequency) to our data in order to find the value of the 
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monopole that is most consistent with our measured vis¬ 
ibility. Following this, we fit the spectrum with a model 
that includes foreground fits and cosmological parame¬ 
ters. For our foreground fi ts, w e use the same parametric 
forms as we did in Section [273} namely Legendre polyno¬ 
mials in logT-logz/. 


4. NUMERICAL SIMULATIONS 

In this section, we bring together the various lessons 
that we have learned regarding instrument design and 
data analysis to numerically forecast the performance of 
a fiducial global signal interferometer. Much of our simu¬ 
lation methodology has already been employed in previ¬ 
ous sections to produce intermediate results, but we will 
provide a quick summary here (and add new details) for 
the reader’s convenience. 

Guided by the rough arguments of Section [2j we simu¬ 
late visibilities from a 6 x 6 square grid of tightly packed 
antennas. The primary beam of each element is taken 
to be a tapered Gaussian of the form given by Equa¬ 
tion (18), with Of, = 0.3rad (for a FWHM of 38.7°) at 
the lowest observation frequency. At higher frequencies, 
the beam width is assumed to be proportional to A. We 
assume that observations are centered on the Northern 
Galactic Pole and span a band consisting of 1 MHz chan¬ 
nels from 100 MHz to 200 MHz for the reionization, and 
separately from 50 to 100 MHz for the pre-reionization 
epoch. (In other words, we are considering two separate 
experiments, both with a primary beam FWHM of 38.7° 
at the lowest frequency part of their band). For compar¬ 
ison, we also predict the performance of a single-element 
global signal experiment using the same type of antenna 
element and the same observing strategy. 

For our simulated foregr ound sky, we use the same 
set-up as we did in Section |2.3[ where each pixel of the 
408 MHz map is extrapolated to the relevant frequencies 
on a pixel-by-pixel basis using Equation (17). The pa¬ 


rameters in the power-law-like extrapolation are drawn 
randomly as before, and we generate 10,000 different 
realizations of the foreground sky. With each sky, we 
then simulate visibilities and total power measurements 
for the interferometric and single element measurements, 
respectively. Frequency-by-frequency estimates of the 


global signal are then obtained using Equation (42) for 
the single element and Equation ( |51| ) for the interferome¬ 
ter. The results are then averageatogether to yield mean 
foreground spectra for each type of experiment. Finally, 
smooth foreground components are fit fro m th ese s pec- 
tnf 17 ! in the manner described in Sections l2.3l and l3.5l 
The result is a set of residual foreground spectra. 

Aside from residual foregrounds, our forecasts must 
also incorporate instrumental noise. Modeling this con- 


12 In principle, one ought to perform foreground fits prior to 
ensemble averaging the different sky realizations. However, since 
each line of sight is generated independently in our simulations, 
ensemble averaging essentially amounts to generating more lines of 
sight. This in fact results in a more conservative foreground model. 
To see this, consider a toy example where each line of sight is a 
randomly drawn power law. Fitting each pixel individually with a 
power law would be guaranteed to return no residuals. However, 
since the sum of power laws is not itself a power law, the aver¬ 
aged spectrum over all pixels contains greater curvature, which 
in general will not be well-fit by a power law. We may therefore 
safely ensemble average prior to foreground fitting, knowing that 
the result will be a more conservative foreground spectrum. 


tribution requires three separate covariance matrices. 
The first is the instrumental noise covariance N of the 
visibilities. We assume that the instrumental noise is 
uncorrelated between different baselines, so that 




tint A/V 


(52) 


where the indices refer to different baselines, tt p = 
f A(r)dfl is the size of the primary beam, T sys is the 
system temperature, A v is the channel width, and £ int 
is the total integration time. We take A v = 1 MHz and 
tint = 300 hrs. For the system temperature, we assume 
that the measurements are sky-noise dominated, and we 
set T sys equal to the primary beam averaged sky temper¬ 
ature. 

With the noise covariance of the visibilities N in hand, 
we can obtain the covariance matrix £ of our estima¬ 
tor x. To do so, we insert N into Equation (32). Since 
x contains estimates of all the spherical harmonic modes 
that we wish to solve for, it is an (f? ma x + l) 2 x (^max + 1) 2 
matrix relating all the errors on the ae m estimates to one 
another. With our focus being the monopole term, we 
require only the first element on the diagonal of £. Ex¬ 
tracting this element and dividing by 47 t (the square of 
the conversion between ago and To) gives the variance 


on To. Repeating this process for every observation fre¬ 
quency, we can place the resulting variances along the 
diagonal of yet another covariance matrix II. This is the 
frequency-frequency noise covariance matrix of our final 
spectrum, and by populating only its diagonal elements 
(setting all other elements to zero), we are assuming that 
noise contributions from different frequencies are uncor¬ 
related. Note that even though we established this pro¬ 
cedure for computing II with interferometers in mind, it 
can be easily adapted for the single-element experiments 
as well. Considering a single baseline of length zero, one 
obtains T s 2 ys (^)U p /£ int Av along the diagonal of II, which 


simply needs to be enhanced by a factor of 2 to account 
for the correlated noise discussed in Section [2~11 
To evaluate the effectiveness of our global signal in¬ 
terferometer, we employ a Fisher matrix formalism. Our 
set-up is es sentially identical to th at of |Pritchard fe Loeb] 
(20101 and Bernardi et al. (2014), and thus we relegate 
a review of the formalism to Appe ndix |Bl As a toy model 
for the dark ages, we again follow Bernardi et al. (20141 
and model the dip at 70 MHz as a Gaussian: 


Td\p(y) = — Aexp ^ 2 ^ , (53) 


where A is the amplitude of the signal, vg is the center of 
the pre-reionization absorption dip, and cr is the width. 
For the reionization signal, we use the form 


-Union (^) 


T21 

2 



1 + tanh 



(54) 


where z r is the redshift of the mid-point of reionization, 
Az is its rough duration, T 2 \ is an overall amplitude, and 
2 = (1420 MHz/i/) - 1. 

The errors on final model parameters will depend on 
the fiducial “true” values that are used in our simula¬ 
tions. We consider three different reionization scenarios: 
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Reionization 


Pre-reionization Dip 

Model 


T 2 1 [K] 

z r 

Az 

A [K] 

v 0 [MHz] 

a [MHz] 

Pessimistic 

Fiducial Value 

0.010 

12 

3 

0.01 

60 

10 


Interferometer Bias 
Single Element Bias 
Interferometer Error 
Single Element Error 

2.85 x 10" 1 
2.50 x 10 _1 
±1.11 x 10 _1 
±2.09 x 10 _1 

3.07 

7.66 x 10' 1 
±2.97 
±2.69 

3.94 x 10“ 2 
-3.68 x lO -1 
±2.99 
±2.92 

-5.59 x lO -1 
9.2 x 10“ 2 
±5.45 x 10 _1 
±5.65 x lO -1 

-1.77 x 10 2 
3.07 x 10 1 
±1.72 x 10 2 
±2.06 x 10 2 

8.68 x 10 1 
-1.41 x 10 1 
±8.55 x 10 1 
±1.16 x 10 2 

Moderate 

Fiducial Value 

0.027 

10.5 

0.8 

0.1 

70 

5 


Interferometer Bias 
Single Element Bias 
Interferometer Error 
Single Element Error 

-5.90 x 10“ 3 
-9.53 x 10" 4 
±4.06 x 10 -3 
±4.35 x 10 -3 

6.39 x 10 -3 
1.54 x 10“ 2 
±1.31 x 10 -2 
±3.28 x 10 -2 

-7.33 x 10“ 2 
-2.63 x 10“ 2 
±5.86 x 10" 2 
±7.81 x lO" 2 

1.50 x 10“ 4 
1.15 x 10 -3 
±2.44 x 10 -3 
±1.21 x 10 -2 

1.77 x 10 -3 
5.95 x 10 -2 
±3.38 x 10" 2 
±1.74 x 10 _1 

-1.9 x 10“ 3 
-2.36 x 10“ 2 
±5.19 x lO" 2 
±2.63 x lO” 1 

Optimistic 

Fiducial Value 

0.027 

8 

0.5 

0.2 

80 

5 


Interferometer Bias 
Single Element Bias 
Interferometer Error 
Single Element Error 

1.95 x 10“ 4 
1.26 x 10 -3 
±1.10 x 10“ 3 
±1.21 x 10“ 3 

4.10 x 10“ 3 
-2.50 x 10 -3 
±5.88 x 10“ 3 
±6.82 x 10“ 3 

1.83 x 10“ 3 
1.17 x 10 -2 
±1.58 x 10" 2 
±1.89 x 10“ 2 

-8.62 x 10“ 5 
-6.50 x lO" 4 
±1.65 x 10 -3 
±6.78 x 10“ 3 

-3.80 x 10" 4 
-1.56 x 10 -2 
±1.04 x 10“ 2 
±3.3 x 10 -2 

8.63 x 10“ 4 
6.42 x 10 -3 
±1.99 x 10“ 2 
±8.36 x 10" 2 


TABLE 1 

Parameters, biases, and 1 cr error bars for different scenarios and experiments. 



Fig. 12.— Fiducial model of the brightness temperature dur¬ 
ing reionization as a function of frequency, following three scenar¬ 
ios: Pessimistic (blue), Moderate (orange), and Optimistic (green). 
Increasing optimism corresponds to reionization occurring more 
rapidly and at lower redshifts, making detection easier. 



Fig. 13.— Fiducial model of the brightness temperature during 
the pre-reionization dip as a function of frequency, following three 
scenarios: Pessimistic (blue), Moderate (orange), and Optimistic 
(green). Increasing optimism corresponds to a deeper dip occurring 
at lower redshifts, making detection easier. 


• Pessimistic reionization scenario, with 
(T 2 i, z r , Az) = (lOmK, 12, 3). With reioniza¬ 
tion occurring in a rather extended fashion at 
relatively high redshifts, this scenario should be 
the most difficult one to detect, since foregrounds 
are brighter at high redshifts. Additionally, an 
extended reionization scenario more closely mimics 
smooth foregrounds. 


• Moderate reionization scenario, with 
(T 21 ,z r ,Az) = (27mK, 10.5,0.8). This model 
is motivated by the best-fit value of z r from the 


Wilkinson M icrowave Anisotropy Probe (Hinshaw 
et al. 2013). The valu e of T 2 1 is t aken from 
theoretical calculations (Pritchard & Loeb 2010), 
while A^r is chosen to be neither too extended nor 
too abrupt. 


• Optimistic reionization scenario, with 
{T 2 \,z r ,Az) = (27mK, 8,0.5). This scenario 
is motivated by recent optical and infrared ob¬ 


ser vat ions 

(Fan et al. 2006 

Bolton et al. 

2011 

Treu et al. 

2(Jl3l i'alsst et al. 

2014[). Having said 

this, our c 

mice of {T 21 ,z r , (27 mK, 8,0.5) 


is not taken from any of these optical/infrared 
studies in particular, since there is considerable 
uncertainty in how the observations should be 
interpreted in the context of reionization. Rather, 
we simply wish to roughly capture the fact that 
generically, such observations favor reasonably 
rapid reionization at lower redshifts, in contrast to 
what is suggested by CMB experiments. 


Since measurements of the Gunn-Peterson trough 
strongly suggest that reionization is complete by z ~ 6 
(Fan et al. 2006), we choose to impose this as a prior 
in our Fisner matrix projections. We will find that only 
the pessimistic scenario is affected by the prior in a non- 
negligible way. We therefore implement our prior by as¬ 
suming that z r and Az are both already known to within 
±3 at la, such that a 2er fluctuation would be required 
for the IGM to be substantially neutral at z ~ 6 in the 
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pessimistic scenario. 

With the pre-reionization dip we again consider three 
scenarios of varying degrees of optimism, although with¬ 
out the CMB as a guide, the parameters here are some¬ 
what more arbitrary: 


• Pessimistic pre-reionization scenario, with 

(A,v 0 ,Az) = (10 rnK, 60 MHz, 10 MHz). 

• Moderate pre-reionization scenario, with 

(A, iz 0 Az) = (100 mK, 70 MHz, 5 MHz). 

• Optimistic pre-reionization scenario, with 

(A,v 0 ,Az) = (200 mK, 80 MHz, 5 MHz). 


These scenarios are depicted in Figures [l2| and 13 The 
corresponding model parameters are summarized in Ta- 
ble|TJ along with the associated Fisher matrix projections 
for each parameter’s bias and lcr error bar after having 
marginalized over the other parameters. Pairwise param¬ 
eter contours are shown in Figures |T4] and Figures 15 
The results for interferometers are shown using the fillec 
regions, with orange portions signifying 95% confidence 
regions and the enclosed white portions signifying 68% 
confidence regions. The black contours demarcate the 
68 % and 95% confidence regions for the single-element 
experiments. 

Under our insistence that the parameter biases from 
residual foregrounds be subdominant compared to the 
parameter errors, we find that for reionization, it is nec¬ 
essary to set -/Vpoiy = 6 for the interferometer whereas for 
the single-element experiment IV po i y = 5 suffices. With 
the pre-reionization dip, we require N po i y = 7 for the in¬ 
terferometer and jY po i y = 6 for the single element. The 
general trend of higher JV po i y for the interferometer re¬ 
flects the inherently more chromatic nature of interfer¬ 
ometry. The generally higher 7V po i y needed to detect 
the pre-reionization dip (compared to that needed for 
to detect reionization) reflects the stronger foregrounds 
at lower frequencies. 

Immediately obvious from Figures [12] and [13] is the fact 
that the pessimistic scenarios will be extremely difficult 
to measure, whether using an interferometer or a single 
element. With those scenarios, the cosmological signals 
are simply too extended, and occur at too high redshifts 
for them to be easily distinguished from the bright fore¬ 
grounds. Indeed, one can see from Table[j]that the errors 
on z r and Az are all approximately ±3, indicating that 
constraints are driven entirely by prior information. 

Encouragingly, we see that both the moderate and the 
optimistic scenarios should be detectable by both types 
of instrument. Importantly, one sees that the interfer¬ 
ometer performs just as well as the single-element ex¬ 
periment does. For the pre-reionization dip, we often 
even get slightly smaller error bars with the interferome¬ 
ter. This is because we conservatively chose to simulate a 
6 x6 array, when a smaller—and therefore higher-noise— 
array wou ld h ave sufficed according to our calculations 
in Section |2.4[ This translates into tighter constraints in 
the moderate and optimistic scenarios, the errors turn 
out to be mostly thermal noise dominated. We note that 
this is not universally the case, and happens only when 
the cosmological signals in question are sufficiently dif¬ 
ferent from the foregrounds (hence the fact that our dis¬ 
cussion here applies mostly to the pre-reionization dip 


instead of reionization itself). To understand this, con¬ 
sider the effect of varying iV po i y in one’s analysis pipeline. 
With iV po i y set too low, the bias is too large and the mea¬ 
sured parameters are inaccurate; with iV po i y set too high, 
the degeneracies between the foreground model and the 
cosmological model result in a large variance and the 
parameters are imprecise. If the cosmological signal is 
different enough from the foregrounds (as is the case in 
the moderate and optimistic scenarios), there are a set of 
intermediate iV po i y where the bias is small but the degen¬ 
eracies have yet to dominate, resulting in a thermal-noise 
dominated measurement. This is particularly desirable 
because it means that measurements with more integra¬ 
tion time will yield better constraints. On the other 
hand, the cosmological signals in the pessimistic cases 
are sufficiently similar to the foregrounds that the vari¬ 
ances increase rather quickly with increasing IV po i y , and 
come to dominate before the biases become negligible. 
There is thus never a thermal noise dominated regime, 
and indeed, we find that increasing the integration time 
does very little to improve the constraints in the pes¬ 
simistic scenario. 

In summary, the results here suggest that an inter¬ 
ferometric measurement of the global signal may be an 
interesting, viable alternative to single-element experi¬ 
ments. Given the small number of antennas necessary for 
a competitive interferometric array, one could even imag¬ 
ine collecting both auto-correlation and cross-correlation 
(visibility) data between antennas, analyzing data from 
the two modes separately as a way to cross-check the 
final results. 


5. CONCLUSIONS 

In this paper, we explored the unusual concept of mea¬ 
suring the global 21cm signal with an interferometer. 
We established general design principles for a global sig¬ 
nal interferometer as well as a general framework for 
data analysis. Numerical forecasts confirmed the via¬ 
bility of an interferometric measurement of the global 
signal, with our fiducial interferometer performing com¬ 
parably to conventional single-element experiments. 

Balancing sensitivity requirements and foreground 
mitigation considerations, we found that an optimal ar¬ 
ray design consists of a small grid of closely packed an¬ 
tennas, each with a FWHM beam size of ~ 40° at the 
lowest frequencies. We chose a two-step process for our 
general analysis method: first we estimated the spatial 
monopole at each frequency channel; then we combined 
the estimates into a single global signal spectra and per¬ 
formed a final foreground subtraction. During the first 
step, we found that overly aggressive downweightings of 
angular foreground modes led to spectral features that 
compromised our ability to remove foregrounds. Based 
on this, we recommend a “simple is best” approach, 
where one essentially performs a linear fit for the best- 
fit monopole given a set of visibilities weighted by their 
sensitivity to the u = 0 mode. In the second step, we 
found that fitting the single-element-derived spectra to 
6 th order logarithmic polynomials were sufficient to re¬ 
duces foreground residuals to acceptable levels, whereas 
the interferometer-derived spectra required 7th order log¬ 
arithmic polynomials. However, the final parameter er¬ 
rors are comparable between the two experiment types, 
and give tight constraints on the pre-reionization era and 
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Reionization; N poly =6 for interferometer; N poly =5 for single element 
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Fig. 14.— Pairwise parameter contours for interferometer and single element experiments probing reionization. For the interferometer, 
the orange contours correspond to the 95% confidence regions, and the white areas within the orange contours correspond to the 68% 
confidence region. For the single-element experiment, the outer solid black contour line bounds the 95% confidence region, while the inner 
black contour bounds the 68% confidence region. The black dot locates the fiducial value to be recovered and thus indicates the bias in 
each detection. The interferometer experiment used an 6 X 6 array of antennas with a primary beam FWHM of 38.7° at 100 MHz. Both 
experiments had an integration time of 300 hours. For the interferometric measurement, a 6th order logarithmic polynomial was needed to 
model the foreground spectra sufficiently well, whereas with the single element experiment only a 5th order polynomial was required. 


reionization itself under reasonably non-pessimistic sce¬ 
narios. 

Future work must address a number of systematic is¬ 
sues that will inevitably arise with an experiment in the 
manner des cribed in this paper. From our discussion in 
Section[2Tj it is clear that an interferometer’s response to 
the monopole will depend sensitively on antenna design. 
Our recommendation of a closely packed array configu¬ 
ration will also likely require careful engineering atten¬ 
tion to reduce the possibility of mutual coupling between 
adjacent antenna elements. Additionally, ionospheric 
fluctuations can cause systematics that do not integrate 
down with time (Datta et al. 2014). Luckily, these sys¬ 
tematics are reasonably spectrally smooth and thus one 
may be hopeful that techniques can be developed to mit¬ 
igate them. In any case, we have shown that an interfer¬ 
ometric measurement of the global signal, while unusual, 
has the potential to rival those from single-element ex¬ 
periments. Importantly, certain classes of systematics, 
such as thermal noise bias, that are absent from interfer¬ 
ometers. (Of course, thermal noise variance is present 
in both types of experiment). Given the technical chal¬ 


lenges of 21 cm cosmology, one should thus explore as 
many complementary experimental approaches as possi¬ 
ble. Doing so will maximize the chances of a near-term 
detection of the cosmological 21 cm signal, providing a 
crucial first step towards an exquisite understanding of 
an excitingly unexplored portion of our cosmic timeline. 
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Pre-reionization dip; N poly = 7 for interferometer; N poiy = 6 for single element 



Fig. 15.-— Same as Figure [Ti] but for experiments targeting the pre-reionization dip. Primary beams are set to have a FWHM of 38.7° 
at 50 MHz. The errors from the interferometric set-up are slightly smaller, thanks to our choice of a slightly larger-than-necessary array as 
our fiducial model. 

APPENDIX 

A. PROOF OF MINIMUM-VARIANCE PROPERTY OF GLOBAL SIGNAL ESTIMATOR 


In thi s Ap pendix, we provide a constructive proof that choosing M to be diagonal (as we do in this paper starting in 
Section 3.31 minimizes the variance of our spherical harmonic mode estimator x, provided H is selected to be N _1 in 
Equation (31). Given this choice, Equation (32) for the covariance S of x reduces to MBMf where B = CfN -1 Q. 
With this notation, the window function matrix becomes W = MB. 

To derive a minimum-variance estimator, we minimize the diagonal elements of £ subject to the constraint that the 
window functions satisfy W/, = (MB)jj = 1. Introducing a Lagrange multiplier A *, we seek to minimize the quantity 


L=(MBMt) ij -A,(MB) li . 

Differentiating with respect to the real and imaginary parts of a generic component Mj*, of M, one obtains 


(Al) 


and 


dL 

SRcMjfe 


(BM f + MB)j t - AjB fc j = 0 


dL 

cdmMjfc 


(BMt - MB), fc - AjB,y = 0, 


(A2) 

(A3) 


where we have set each expression to zero in order to perform a minimization. Now, recall that B is Hermitian. 
However, the combination BMt _ MB is by construction anti-Hermitian. For the second equation to hold, then, 
we require that BMf = MB, so that the anti-Hermitian portion vanishes identically. With this, the first equation 
reduces to 2MB = AB, where Ay = A iSij. Acting on both sides with B' 1 then gives M = A/2, which tells us 
that M must be diagonal. To fix the values of the Lagrange multipliers along the diagonal, we use our constraint 
W a = (MB),;,; = 1 to obtain My = <5^/ B;,. Recalling the definition of B, we see that this is precisely the form of 
Equation (|37|, completing our proof. 
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B. GLOBAL SIGNAL FISHER MATRIX FORECASTING METHODS 


In this Appendix, we review the Fisher matrix formalism used in Pritchard & Loeb (2010) and Bernardi et al. (20141 
to forecast the error bars and biases in a global signal measurement. We do not claim any originality here, and include 
a description of the formalism only for completeness and consistency of notation. 


Suppose we group our final estimate of the global signal T$(y) into a vector T, so that each component corresponds 
to the measured global signal value at a particular frequency. With this notation, the Fisher information matrix is 
defined as 


i.’ — 

c ij — 


d' 2 lnJz?(0|T) 


dOidBi 


(Bl) 


where Jz?(0|T) is the likelihood function of a set of model parameters 0 given the measurements T, with 0q denoting 
a set of fiducial values for the parameters. Now, recal l tha t T contains the measured global signal spectrum prior 
to the smooth foreground fitting described in Section |3.5| It therefore contains not only the cosmological signal 
(parameterized by one of the forms given in Section [4]), but also foregrounds and noise, and may be written as 

T = T cosmo + Tf g + <t = + ^Tf g + <t , (B2) 


where T cosmo is the cosmological signal, Tf g the foregrounds, and <x the residual instrumental noise. In the second 
equality, we se parated the foreground contribution into a portion accounted for by the smooth spectrum fits described 
in Section 


3.5 


and a residual, (5Tf g . The former contribution is combined with the cosmological signal to give a model 
(i(0) for our spectrum. Since this model contains our foreground fits , the para meter vector 0 records not only the 
astrophysical/cosmological parameters such as those in Equations (53) and (541, but also the foreground parameters 
in Equation (20). Assuming that the instrumental noise is Gaussian and has zero mean and covariance II = (erer 4 ), 
the Fisher matrix takes the form 


F- ■ — 

- 1 - rn — 


'V TT -i dfi 


80 , 


n 


80 i 


(B3) 


Once the Fisher matrix has been computed, the smallest possible error Adi (in the information theoretic sense) on 
the parameter 9i is obtained by calculating A 8i = \J (F -1 )^. While an actual experiment may not necessarily deliver 
error bars that are as tight as this, the predictions of the Fisher matrix formalism are nonetheless a useful guide for 
experimental design. 

Prior information can also be incorporated into our parameter estimates. If observations from other probes have 
already constrained the *th parameter to within an error of this can be accounted for by adding e~ 2 to the ith 
diagonal element of the Fisher matrix. The revised parameter errors will be smaller not just for the *th parameter, 
but for all the other parameters as well, since knowing the ith parameter better can help to break degeneracies. 

So far, we have only concerned ourselves with the variance of the final parameter constraints. However, foreground 
residuals will cause more than a spread in the parameter fits—they will also bias the fits in a systematic way. This 
bias is given by 

S0 i = Y / (F~ 1 ) ij ^ : Tl- 1 ST fs . (B4) 

j 3 


In Section El <5xf g is obtained by running Monte Carlo simulations. We generate realizations of the foreground sky, 
which are tnen fed through a simulation of a measurement and data analysis. Averaging over simulations, the resulting 
spectra are then fit to logarithmic polynomials. The residuals (i.e., <5Tf g ) are given by Equation (20). 
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